WICG / shared-storage

Explainer for proposed web platform Shared Storage API
Other
95 stars 24 forks source link

Extending shared storage API to support advanced reach reporting #50

Closed EvgSkv closed 1 year ago

EvgSkv commented 1 year ago

We are excited about the shared storage API and its support of Reach measurement.

Given that Reach is a fundamental metric of brand advertising and that accurate assessment of ad campaign efficiency requires accurate and flexible reach measurement, we would like to request that shared storage API extends its functionality to support advanced scenarios of reach measurement in a privacy safe manner. In particular, it is important that the system scales to thousands of advertisers pulling interactive exploratory reports on the ongoing and finished campaigns daily. We believe high utility for Reach advertisers should be achievable with reasonable privacy budget settings.

Privacy Sandbox is critical for maintaining high quality of reach measurement. In the absence of features of the privacy sandbox discussed below the reach modeling could use domain partitioned cookies as a signal, however lack of cross-domain deduplication signal poses a huge challenge for unique user count deduplication. It is unclear when and if technology of sufficient quality using domain partitioned cookies can be developed. Furthermore development of such technology could potentially pose an extra risk on user privacy, as accurate cross-domain deduplication done in the clear context (compared to on-device nature of Chrome Privacy Sandbox) may have negative effects on user privacy.

Specifically the following functionality is critical to make sure that important reach reporting scenarios can function with high quality while being powered by the shared storage.

  1. Availability of the secure report in the context of the event. The explainer states:

The report contents (e.g. key, value) are encrypted and sent after a delay.

This means that users of the system have to pre-define reporting segments before the ads are served, however, modern advertising reach reporting and optimization use cases enable interactive slicing of the traffic on various criteria directly linked to the event context, such as reporting time window, device type, time of day, location, etc.

Additionally, encoding all of these options into the key rather than creating reports on-demand would put unnecessary strain on the privacy budgets.

On the other hand, it should be possible to add sufficient levels of noise on the final aggregates on demand, to ensure high standards of privacy protection without sacrificing the flexibility of reach slicing without requiring delayed reporting, which further limits the freshness of reporting capabilities.

Therefore we would request you to kindly consider the following approach:

Allow the aggregated report entity to be available immediately in the page javascript context so that at a later time the ad tech would have an option to upload a batch of the reports for further aggregation. Since aggregated reports are returned unconditionally for each impression, its arrival does not provide any extra information for ad-tech.

Since the final aggregated report would have a differentially private noise and appropriate privacy budget tracking, this option would maintain high privacy protection standards. Meanwhile it would keep the reach slicing flexibility that modern reach reporting flows rely upon now.

Alternatively the report could still be returned with a delay, but be accompanied with an event level identifier that would allow it to be joined to the original event.

  1. Enable count_distinct secure aggregation function. So far the explainer only mentions sendHistogramReport function for sending the reports.

The histogram report appears to be insufficient for implementation of Virtual People technology in the privacy sandbox. This technology is used by Google and the Cross-Media Measurement project.

Aggregating Histogram with per-bucket noise is insufficient, because each browser gets mapped to a virtual person and the count of virtual people rather than browsers is important. Histogram is good for counting unique browsers by some partitions, but is incapable of counting unique virtual people.

IAB Audience reach measurement guidelines page 4 reads: "deriving unduplicated audience reach people-based measures from digital activity and other research is the most difficult of the metrics however, it is also inherently the most valuable to users of measurement data."

Providing the count_distinct aggregation function would be enabling a natural implementation of the Virtual People technology and proper differential privacy noise is capable of ensuring high privacy protection standards.

The count_distinct can be implemented for the buckets of the histogram, so that no new type of report would be required.

To support demographic composition and frequency scenario it should be possible to filter histogram buckets based on index and on the range of the value.

  1. Enable pre-aggregation of the reports for further quick combination at serving time with low latency.

Interactive reports are critical to an advertiser's ability to understand the reach of the campaigns that they are running.

To enable interactive exploration the aggregation API would need to provide the ability to pre-aggregate histograms and return an intermediate data structure result encrypted. Then such intermediate reports could be pre-aggregated for atomic reporting units and reach for a collection of reporting units extracted at real time when report is required.

  1. Enable combining reports with first party reports.

Campaigns could be running with some events served on first-party sites, while others on 3rd parties. One way to get deduplicated reach of such campaign accurately would be to let the secure aggregation server to digest histogram that is constructed by the ad-tech in the clear, along with pre-aggregated encrypted histogram.

  1. Secure aggregation should be scaling to impression-level reports.

Each ad impression would be emitting a reach report and secure-aggregation infrastructure should be scalable to large volumes to make sure that the reach use-case is supported.

Again, thank you very much for providing this flexible privacy safe api and thank you very much for your consideration.

csharrison commented 1 year ago

Hey @EvgSkv do you mind cross-posting this issue to https://github.com/patcg-individual-drafts/private-aggregation-api? I think that's a better repo for discussion. I think the shared storage mechansim as it currently exists cannot be updated to accommodate any of these feature requests.

pythagoraskitty commented 1 year ago

Closing out the Shared Storage Repo version of the issue, so that you can continue discussing in the https://github.com/patcg-individual-drafts/private-aggregation-api repo. Thanks!

Please re-open (or open a new issue) here if you need any more Shared Storage clarifications.