WICG / attribution-reporting-api

Attribution Reporting API
https://wicg.github.io/attribution-reporting-api/
Other
357 stars 169 forks source link

Event-level API: deduplication and priority #700

Open alois-bissuel opened 1 year ago

alois-bissuel commented 1 year ago

Hello,

I have a question regarding the interaction between deduplication and priority. It seems that deduplication is applied before the priority system is used (in other words, triggers are dropped at registration time if there exists one matched trigger to the source with the same deduplication, see step 7 of the attribution algorithm).

This creates an issue when trying to track a hierarchy of triggers where one wants to get all conversions (the top of the hierarchy) and only one other type of events, preferably the highest one, to attribute visits. In our case, we manage this by using a common deduplication key for all events except conversions (where dedup is also used to really deduplicate transactions which may be sent to us twice). We add a priority system to get the highest event in the hierarchy for visits. This does not work as the deduplication is done first, and so any subsequent events (not conversions) will be deduplicated.

Would it be possible to run the deduplication after the priority system or at least not dropping deduplicated triggers if they have a higher priority than the one currently being matched to a source ? Thanks a lot!

csharrison commented 1 year ago

This is a good question. It feels a bit conceptually wrong to me to have a deduplicated trigger kick out another trigger that's lower priority than it, although I can see the value it has especially with your usage of the dedup feature. The original purpose of the dedup was to avoid double counting, and in that case I suppose a true "duplicate" trigger would have the same priority.

It feels like the primitive you are asking for is something like a sub-grouping of triggers, which are independently capped, almost like a partitioned number of event-level reports field. This field is subtracted and processed after the priority system, and the priority system currently is only run if this field hits a max (10.9.15).

The way to handle this with dedup keys would be to alter 10.9.15 to something like: If we've hit maxAttributionsPerSource OR the dedup key matches any matching reports in the cache, run the prioritization algorithm. For dedup prioritization, only consider matching reports with the same dedup key.

cc @apasel422 @johnivdel for thoughts

csharrison commented 1 year ago

also @linnan-github

alois-bissuel commented 1 year ago

I think the idea of capping triggers per sub-group was somehow hinted at in #278 in point 1 of your comment (this was the original source of our use of the dedup for what we are doing!).

I think such limit per sub-group could solve our present issue.

csharrison commented 1 year ago

I think such limit per sub-group could solve our present issue.

I agree. The main consideration will be how to design the API in such a way that interacts nicely (or completely subsumes) the dedup concept without introducing a lot more complexity. We'll need to think about it.

johnivdel commented 1 year ago

I think the flexibility of allowing prioritization within a deduplication key is useful. In the past, we have talked about using the dedup key as this primitive for only measuring "one conversion from a set of conversion types once", which is why we favored using a new key rather than the trying to do deduplication based on trigger data itself.

See some discussion here: https://github.com/WICG/attribution-reporting-api/blob/2867a961a2a7020ec2ff83812d5ba0501733b33a/meetings/2022-01-10-minutes.md#alo%C3%AFs-bissuel-priorities-and-reporting-window-in-the-event-level-api-for-clicks-issue-278

From a spec perspective, I think the easiest way to solve this would be the introduction of a new step which specifically applied priority to an existing deduped report, as we should be guaranteed there is only one in the cache at a given time.

One idea would be adding new substeps to 10.9.7, along the lines of this psuedo-code:

1. Let |report| be a report in the [event-level report cache] whose dedup key is |x| and source identifier is |y|, or null.
2. If |report|'s trigger priority is less than the new priority, replace it.

Something like what Charlie mentioned above probably scales better with any potential changes to the priority system (but needs to resolve some of the early exit logic).