csharrison / aggregate-reporting-api

Aggregate Reporting API
41 stars 10 forks source link

Examples of Aggergats #17

Open jdwieland8282 opened 4 years ago

jdwieland8282 commented 4 years ago

Thanks @csharrison & @michaelkleber for your comments last week at TPAC re cohort assembly. I know this is a passionate subject with lots of strongly held opinions. I appreciate your professionalism.

Is it possible to see a schema, or even better an example of the "aggregate" file, that you guye envision as input for cohort assembly? I need something concrete to look at and evaluate. ~thanks

michaelkleber commented 4 years ago

Hi Jeff,

The Aggregate Reporting API was one of our earlier explainers and hasn't received the iterative development attention of https://github.com/WICG/conversion-measurement-api, whose infrastructure it would reuse. So I don't have the clearest answer for you here.

That said, what I was imagining during the TPAC discussion was using an API like this to send a once-per-day report from each browser, indicating what sites (or parts of sites) you work with that that browser had visited. The reports would need to be aggregated together before you could see them, and that would yield something like this:

# [day, list_of_domains_visited_today, approx_count]
['2020-Oct-27', ['bar.com'], 6765]
['2020-Oct-27', ['bar.com', 'foo.com/sports'], 55]
['2020-Oct-27', ['bar.com', 'foo.com/news', 'foo.com/sports'], 144]
['2020-Oct-27', ['foo.com/news'], 89]
['2020-Oct-27', ['foo.com/sports'], 987]
['2020-Oct-27', ['foo.com/news', 'foo.com/sports'], 2584]

The bottom row, for example, would indicate that approximately 2584 different browsers had visited pages on both foo.com/news and foo.com/sports today but had not gone to bar.com at all. And note that there is no row counting how many people went to just ['bar.com', 'foo.com/news'], because actually only 5 people visited that pair of sites, and it's below the aggregation threshold.

To get this data, your code would be running on the two otherwise-unrelated domains foo.com and bar.com, and you would be responsible for classifying the pages on foo.com into the /sports vs /news subsections, and calling a browser aggregation API to record your label for the page the user was visiting.

As I said, this is not a highly-developed proposal. If you were hoping to get something different from the report, then please let's discuss.

In addition to this kind of aggregate data gathering, you would need some new TURTLEDOVE-related API that would let you put people into interest groups based on the aggregate data. That would need to happen either with some on-device mechanism or using a trusted server, as in your ProprietaryCohorts or FLoC+Server ideas.

jdwieland8282 commented 4 years ago

Thanks @michaelkleber let me discuss internally and get back to you.