WICG / attribution-reporting-api

Attribution Reporting API
https://wicg.github.io/attribution-reporting-api/
Other
359 stars 170 forks source link

High volume of source-storage-limit #1103

Open michal-kalisz opened 11 months ago

michal-kalisz commented 11 months ago

Hi,

For approximately 6% of all attempts to register ARA sources, we encounter the source-storage-limit. For PCs, it reaches 10% (for phones, it's about 3%). For some source websites (often those for which we display the most ads), it goes above 20% (up to a maximum of 33%).

As ARA is only being tested on a small scale, and more ad tech companies are likely to use it as 3rd-party cookies phase out, this problem is becoming more noticeable. Especially, bigger players with lots of ads and partners could hit the limit, preventing new registrations until old ones expire.

Maybe a good solution would be to introduce a storage limit that also considers the reporting origin. Additionally, it seems important to have a mechanism that allows overwriting or deleting previous registrations.

Another idea is to define separate limits for different types of ARA sources (event, navigation). For instance, an event-type event can be registered multiple times, even without any user interaction, unlike navigation events.

Best regards, Michal

csharrison commented 11 months ago

Thanks for the report, let me also cc @agarant @akashnadan . We can look into this.

agarant commented 11 months ago

Thanks for the report @michal-kalisz, an increase from 1024 to 4096 of this limit will be effective from M120. We will be monitoring the impact of the increase. In parallel, we are looking further into additional mitigation measures.

Makpara commented 11 months ago

Hi,

For approximately 6% of all attempts to register ARA sources, we encounter the source-storage-limit. For PCs, it reaches 10% (for phones, it's about 3%). For some source websites (often those for which we display the most ads), it goes above 20% (up to a maximum of 33%).

As ARA is only being tested on a small scale, and more ad tech companies are likely to use it as 3rd-party cookies phase out, this problem is becoming more noticeable. Especially, bigger players with lots of ads and partners could hit the limit, preventing new registrations until old ones expire.

Maybe a good solution would be to introduce a storage limit that also considers the reporting origin. Additionally, it seems important to have a mechanism that allows overwriting or deleting previous registrations.

Another idea is to define separate limits for different types of ARA sources (event, navigation). For instance, an event-type event can be registered multiple times, even without any user interaction, unlike navigation events.

Best regards, Michal

AramZS commented 11 months ago

I generally think that

a mechanism that allows overwriting or deleting previous registrations.

seems wise. Especially if it can be handled both by count and date (delete x number starting with the least recently registered moving toward the most recently registered).

michal-kalisz commented 6 months ago

Hi!

Is there any update regarding this issue?

After increasing the limit, 1.8% of sources are not being registered due to the source-storage-limit.

Do we have any insight into whether the suggestion to adjust this limit on a per-reporting-origin basis was addressed?

Michal

johnivdel commented 6 months ago

Thanks @michal-kalisz, that is useful data.

We've discussed adding a separate reporting-origin scoped limit here, but likely this would be a lower limit. We can look more into this given the numbers here.

I'd be interested to understand if there are any patterns which result in this limit being hit more frequently. Are multiple impressions for the same ad a significant contribution? One thing we have discussed in the past is allowing for impression deduplication: when a source is registered it provides a dedup key which deletes previous registrations that share that same key.

It would be helpful to know if this kind of approach would help.

michal-kalisz commented 5 months ago

Hi John,

I apologize for the delayed response.

The idea seems very interesting. However, I'd like to revisit the idea of dividing the limit by event type: while registrations of the "event" type may occur frequently, "navigation" requires user interaction and thus happens less often - therefore, it would be worth considering such a division so that one counter does not dominate the other

From what we've observed: the problem occurs with many SSPs and in many publishers (but for some publishers, the percentage of reported errors is high, with one of the large publishers even reaching 26% for 10% of users).

A more interesting approach may be to look at it per user: looking at the entire month, 90% of users who exceeded the 'source-storage-limit' limit at least once had less than 287 ARA source registrations. (calculations based on verbose debug). p95 - 562, p98 - 1087, p99 - 1641.

I'd be happy to discuss this further at Monday's meeting.

If you would like any additional statistics, please let me know.

Michal

arpanah commented 5 months ago

cc: @akashnadan @vikassahu29