Closed rwiens closed 2 years ago
Thanks for the questions!
Question: How do you prevent domains stomping on one another's content? And what prevents a company from reading a different company's content?
Data is stored per origin. So if you're in
Question: Why does N need to be so small? Is there room to scale larger if your base population is large enough? While a small number of bits can be identifying if you have a small number of users, it's possible to have large numbers that aren't identifying if there's a large number of users, as long as the numbers are evenly distributed. Given that this proposal already requires aggregated reporting, would it be feasible for someone to run more types of treatment and just wait longer until enough users are in each treatment group to safely generate the aggregate report?
TLDR: The larger N is, the faster Shared Storage leaks cross-site data, such as bits of a user id.
With N urls as input to choose from, one is selected. That's an output of an arbitrary log2(N) number of bits of cross-site entropy in the output. That output goes to a fenced frame, and when the fenced frame is clicked it gets to navigate, meaning that the embedder's 1p identifier for the user (which could be in the URL selected) is then combined with those log2(N) cross-site its and sent out on the network in the worst case.
~33 bits is enough to uniquely identify every person on the planet. So we want N to be as small as possible so the cross-site data leaks as slowly as possible, so that reidentification of users takes as long as possible.
Is there any way to support multiple independent treatments with different user splits? E.g. if one treatment coloured the website background red and another treatment changed the font, could the users be split into these experiments independently such that some arbitrary subset of users might be in both?
Yes, but you're still limited in number of options by N.
Anything the function calls or loads needs to be done via opaque URL. In the long term, everything loadable via opaque URL should be a blob that's downloaded ahead of time and available offline. Put otherwise, visiting this URL wouldn't result in a server call when the user's browser actually accesses it.
We're still figuring this out. Downloading ahead of time leads to large waste of resources for all of those N-1 urls that aren't chosen. It also would make video ads prohibitively expensive (unless only the first few seconds play and a click is required for more).
"Operations defined by one context are not invokable by any other contexts." -> I'm not sure what this means. Does it mean we couldn't rerun the same treatment on different domains?
It just means that a worklet is scoped to the current document (e.g., top frame or iframe). Each worklet can run whatever code you like in it. You can certainly run the same treatment on different domains.
Unclear how flexible these functions/worklets are aside from letting you pick between 1 of 5 opaque URLs. Can you actually change behaviour in these functions, or does the behaviour change need to be within the URL? If the former, can you apply different worklets to different situations, so that you get behaviour changes for some situations and URLs for others?
The output is one of the input URLS verbatim, no changes allowed. You can choose which selection method within the worklet you want to run for the given list of URLs. Not sure if that answers your second question here.
Seems like you can write as many metrics as you want as long as it's in key-value pairs, but you can only get aggregate results returned to your servers at some time interval.
The aggregate reporting API is still being worked out. But this is the general idea, yes.
Is there any built-in support for splitting the metrics by treatment groups, or do you have to manually write each group to a different metric key?
There may be some amount of metadata associated with each report so that you can process only those reports that you are concerned with in a particular query. Still not quite worked out.
Will the metrics include confidence intervals?
This seems likely.
Thanks for all the answers! This definitely clarifies my understanding.
N being so small feels like it may really limit the usefulness of this design. I wonder if there's any possibility to build in k-anonymity in such a way that would allow someone to run more experiments? For example, it sounds like the current approach allows the domain owner to assign users to a URL any way they want. If there was an option to instead have the browser enforce the URLs are evenly but arbitrarily split between the users, and maybe even drop arbitrary URL parameters, could we support a higher N? I haven't fully thought through these ideas so let me know if they're way off base, but as a half-baked idea, perhaps one alternative might be adding an intermediate layer to check how often a URL has been used before allowing the domain owner to add more? E.g. one URL is allowed until the intermediate layer has seen at least k users using it, then a 2nd is allowed until both URLs have at least k users, then a third is allowed, etc. I'm guessing there'd probably be a lot of trade-offs with that approach and I'm sure there's other alternatives you've thought about, so I'm interested in hearing if there's more you can share about if there are any possibilities for achieving a higher N while still retaining k-anonymity and user privacy.
Gentle ping on this question.
Sorry for the slow response! I missed it over break. Your suggestions limit the 1p (embedder) from getting to add its identifier to the URL. But the log2(N) bits of cross-site data remain in the resulting fenced frame.
Okay, so to double-check my understanding, sounds like allowing the 1P to pass their 1P cookies to the 3P is considered a requirement of this design, and is considered a higher priority than increasing N? In which case would it be fair to say a privacy goal is to make sure that the 3P can't join cookies from different 1Ps to identify a user?
I think I'm still having a bit of trouble wrapping my head around why the number of bits of entropy matter rather than k-anonymity. I'm not an expert so maybe I'm missing something obvious so if you have any examples of where the entropy would cause a problem, I'd appreciate it.
As an example to demonstrate how I'm thinking about this, I usually see it as whether or not 1 bit or 10 bits is identifying depends on its population and distribution. If I have two users with bits 1 and 0 , then one bit is identifying. If I have 1000 users, 999 with bit=1 and one with bit=0, then bit=1 isn't identifying but bit=0 is. But if I have 1000 users, 500 with bit=1 and 500 with bit=0, then the bit is no longer identifying for k-anonymity of k=500.
So my point here is if we were able to guarantee that there were at least k users for each unique bit combination for a given 3P, even if websites were still passing 1P cookies to the 3P, I'm having difficulty understanding why it would matter how many of those bit combinations existed if they all have k-anonymity. Even if a malicious actor were to try to join cookies across 1Ps, their unique bit combination would still have at least k users, and they won't know which of those k users for a given bit combination actually visited multiple websites or only one of them. Happy to learn more about what I might be missing here.
Overall, N being so small really restricts the proposal's usability for, e.g. large ad networks that have a lot of developers who want to experiment on many different pieces of functionality. Instead I'm wondering if the design could be refocused around k-anonymity rather than bits of entropy, as long as we can guarantee that it still meets privacy needs. This would better allow for larger companies, that by nature have a lot of users, to reasonably experiment on their many, many pieces of functionality. The alternative for N=8 would be having to choose which 7 experiments and their shared control will be prioritized for the entire company, which isn't feasible.
Hi! I understand that the question here may take a while to consider - do you happen to have a rough ETA on when you think you'll have a reply by?
Hi! I understand that the question here may take a while to consider - do you happen to have a rough ETA on when you think you'll have a reply by?
Last week was super busy for me, sorry again for the slowness.
Okay, so to double-check my understanding, sounds like allowing the 1P to pass their 1P cookies to the 3P is considered a requirement of this design, and is considered a higher priority than increasing N? In which case would it be fair to say a privacy goal is to make sure that the 3P can't join cookies from different 1Ps to identify a user?
I'm all for exploring ways in which we can limit the 1p from passing identifying information into the Fenced Frame. It's not a hard requirement that it be able to. And yes, it is a privacy goal to ensure that the 3P can't join cookies from different 1ps to identify a user.
So my point here is if we were able to guarantee that there were at least k users for each unique bit combination for a given 3P, even if websites were still passing 1P cookies to the 3P, I'm having difficulty understanding why it would matter how many of those bit combinations existed if they all have k-anonymity. Even if a malicious actor were to try to join cookies across 1Ps, their unique bit combination would still have at least k users, and they won't know which of those k users for a given bit combination actually visited multiple websites or only one of them. Happy to learn more about what I might be missing here.
Because over time you get to call the selection operation repeatedly. And each time you might learn that a user is a part of a different k-anonymous group, and you can combine the knowledge from each of those groups the user is a member of to form an identifier. e.g., being a member of group X might not be identifying, but being a member of X, Y, Z, F, and R is. This is why I talk about it in terms of information. Assuming the information is disjoint, then you can add the information from each call until you have enough bits for an identifier.
I wanted to add that I do think there is value in making the input urls to runURLSelectionOperation k-anonymous. With this property, it is hard for the fenced frame to leak the cross-site information in a useful way to anyone, even if it has full network access, because there is no 1st party or third party identifier to tie the data to. Once the fenced frame receives a user gesture, and can navigate, at that point the cross-site data can be joined with the destination site's cookies. So we do still need to limit the number of total input urls to reduce the amount of information leaked on that click, but the k-anonymity does provide better privacy properties before click.
Thanks for the reply. So to check my understanding, is the concern you mentioned about intersection of groups and proposal #17 both about when a user changes groups for a given 3P over time?
Example: a user starts in experiment X, then as the experiment configuration changes they get put in experiments Y, Z, F, and R consecutively. 3P initially just sees k users in experiment X, but if they have the user identifiers from the site, they can retain history for each ID. Once enough time passes, the 3P can examine the history and notice that the given 1P ID on site Foo has the exact same group history {X, Y, Z, F, R, ...} as a different 1P ID on site Bar, and are statistically likely to be the same user.
I do agree with proposal #17 to make the URLs k-anonymous and michaelkleber's boot-strapping suggestion to help address this. Once the URLs are k-anonymous, would that be sufficient that we would no longer need to worry about bits of entropy? I.e. would we no longer need to have a hard limit N on the number of experiments, but could instead scale the number of experiments based on the quantity of traffic?
Thanks for the reply. So to check my understanding, is the concern you mentioned about intersection of groups and proposal https://github.com/pythagoraskitty/shared-storage/issues/17 both about when a user changes groups for a given 3P over time?
Yes. If the same user can be correlated across fenced frames, then the data can be added together over time. Fenced Frames, on click, are allowed to navigate to a destination page. The destination page will have its 1p cookies and so it knows who the user is (from its perspective) and can collect the various groups that the user is in over time, eventually revealing their identity. We want this revelation to be slow, so we reduce the number of bits.
Thanks. Can you answer the final question as well?
Once the URLs are k-anonymous, would that be sufficient that we would no longer need to worry about bits of entropy? I.e. would we no longer need to have a hard limit N on the number of experiments, but could instead scale the number of experiments based on the quantity of traffic?
The data flow into the FF is: embedder k-anon URL and log(n) bits of cross-site information.
The data flow out of the FF on navigation is: whatever was input to the FF, which gets tied to the destination page's 1p cookie.
The k-anon URL slows the rate at which the destination page can tie its 1p cookie to the embedder's 1p identity.
The log(n) bits still get tied to the destination's 1p identity, which is bad. So we still want n to be small.
Thanks, the data flow explanation helps me understand the architecture here better. Just to double-check, the log(N) bits of cross-site information are not some extra data sent along with the URLs, but the outcome of the fact that we have N urls in the first place, correct?
Apologies for taking up so much of your time, but I'm still struggling to understand why the number of URLs, N, matters if the URLs are each k-anonymous. Someone could use the URLs to derive log(N) bits of information about a particular group, but there still shouldn't be a way for them to meaningfully narrow down individuals within the group, is there? If I understand correctly, the 1P ID can't be passed on the URL since the URL is required to be k-anonymous, and there shouldn't be any additional data other than the URL that would allow the destination page to meaningfully join users across different groups?
Does it matter if the API leaks log(N) bits of info about a group, if the group is still k-anonymous and it's infeasible to narrow down individual users in the group? Are there some other trade-offs or aspects of how request processing works that I'm missing here?
Hello! I want to piggyback on this open thread and chime in by stating that we're also interested in running cross-site experiments using this new API but a limit of 8 is too small to be feasible. Is it possible to drop this limitation?
What number would you consider to be reasonable? Why is 8 too limiting?
My team's clients are different from jcma's, but speaking for myself, we have the following requirements:
Experiments are critical infrastructure that allow engineers to measure the impact of changes so we can detect outages and improve user, advertiser, and publisher experiences.
Hi Josh! I assume you've probably been busy recently and this scaling question may take some thought. Do you have an estimate for when you'll have an answer by?
I'm concerned that what you're describing doesn't mesh well with the input urls needing to be k-anonymous. A URL won't be selectable by SharedStorage until it meets the threshold (say 50-100) of unique clients that would have selected it.
I'm interested in discussing a larger set of urls, since the privacy loss scales logarithmically, but I want to make sure we keep it as low (and useful) as possible.
Sorry for the late reply.
In reply to why 8 is too limiting and what number would be reasonable, Rachel said everything I wanted to say in this comment
As far as I'm aware a k-anonymous URL should still be compatible with our requirements, as long as we can scale to a much larger number of URLs (i.e. remove the limit of n URLs, and allow arbitrary number of URLs as long as k-anonymity is still met) .
We don't necessarily need Chrome to handle the overlapping layer mechanics. We can set up the layer independence algorithms ourselves as long as Chrome provides a mechanism for making a large number of user-sticky k-anonymous groups (which I assume is the URLs in this case). We are also okay with a ramp-up period during which these URLs/groups are unavailable until the k-anonymity threshold is met.
So to put it otherwise, if a group of sites had 1000 users and Chrome decides k-anonymity of k=100 is appropriate, then a naive single-layer implementation should allow a party to divide that traffic into 10 k-anonymous groups, i.e. 10 URLs. If Chrome allows us to scale based on the size of our user base rather than based on an arbitrary number of bits, I believe we can satisfy both privacy and scaling requirements.
After some more consideration, what we're asking for is sufficiently different from the existing API proposal that it's better suited to its own issue. I'll close this issue since I now understand the existing proposal better, and I've opened #22 to discuss the specifics of our feature request. Thanks so much for your time so far and the continued discussion!
Hi! I'm trying to understand how this proposal might work for cross-domain A/B experiments (e.g.trying to enable the same treatment on the same subset of users across 2 different websites). I've described below what my current understanding is of how the Shared Storage API proposal works as well as some questions on parts I'm unsure of. Could you please help clarify any parts I misunderstood?
Shared Storage
Anyone can write to the shared storage but there are limits on who and what content can be read from it.
Worklets
Websites can also write functions (called "worklets") that Chrome will execute in-browser based on the contents of the shared storage. The worklets can edit the shared storage, trigger the aggregated reporting workflow, or return content in an opaque URL, but not anything beyond that.
E.g. a party could write a unique identifier (called a seed) to the shared storage, and perform different treatment based on a modulus of the identifier. Put otherwise, the user could be assigned to one of N different treatment groups but would be consistently assigned the same treatment.
It seems like there might be very low limits on N, but it's unclear how N is determined. Further comments show N is expected to stay under 10.
Question: Why does N need to be so small? Is there room to scale larger if your base population is large enough? While a small number of bits can be identifying if you have a small number of users, it's possible to have large numbers that aren't identifying if there's a large number of users, as long as the numbers are evenly distributed. Given that this proposal already requires aggregated reporting, would it be feasible for someone to run more types of treatment and just wait longer until enough users are in each treatment group to safely generate the aggregate report?
Is there any way to support multiple independent treatments with different user splits? E.g. if one treatment coloured the website background red and another treatment changed the font, could the users be split into these experiments independently such that some arbitrary subset of users might be in both?
Anything the function calls or loads needs to be done via opaque URL. In the long term, everything loadable via opaque URL should be a blob that's downloaded ahead of time and available offline. Put otherwise, visiting this URL wouldn't result in a server call when the user's browser actually accesses it.
"Operations defined by one context are not invokable by any other contexts." -> I'm not sure what this means. Does it mean we couldn't rerun the same treatment on different domains?
Unclear how flexible these functions/worklets are aside from letting you pick between 1 of 5 opaque URLs. Can you actually change behaviour in these functions, or does the behaviour change need to be within the URL? If the former, can you apply different worklets to different situations, so that you get behaviour changes for some situations and URLs for others?
Aggregated Reporting
The only way to get info back about your treatment is sending metrics in aggregate.
Seems like you can write as many metrics as you want as long as it's in key-value pairs, but you can only get aggregate results returned to your servers at some time interval.
Is there any built-in support for splitting the metrics by treatment groups, or do you have to manually write each group to a different metric key?
Will the metrics include confidence intervals?