WICG / attribution-reporting-api

Attribution Reporting API
https://wicg.github.io/attribution-reporting-api/
Other
359 stars 170 forks source link

App-to-Web and X-Device #541

Open bmayd opened 2 years ago

bmayd commented 2 years ago

In the August 8, 2022 meeting we discussed changes to the ARA for app-to-web which shifts responsibility for attribution from the browser to the OS. I think it is worth considering how this change might impact potential cross-device attribution supported by browser data synchronization given the requirement that normal attribution logic in the browser is halted when OS-supported attribution is enabled.

With web-to-web, browser instances could share state through periodic back-end synchronization. Assuming state changes between browser instances which impact attribution are appropriately accounted for (e.g. report submission coordination, user preference changes, data deletion, etc.), it could provide a wider view of the activity of a logged-in account in the web-context and not reveal more information than would be revealed if the user used only a single browser instance.

With OS-supported attribution, the primary context shifts from the web to the device and the data store includes data from multiple contexts: the web context shared by all browser instances (there’s one web) and each of the app, application, etc. context on the device.

In the two models above, data sharing happens within bounded contexts (the web and the device) and no information is shared across contexts. However, synchronizing OS-supported attribution data across devices would require combining information across multiple independent device contexts, thus leaking information unique to each device to the other devices.

Assuming there is a desire to support both cross-device and cross-app attribution scenarios, a potential mitigation strategy would be to allow campaigns to configure attribution to be either cross-device/intra-app (browser supported) or cross-app/inter-device (OS supported). In the former case, attribution logic would not need to be halted, in the latter it would.

csharrison commented 1 year ago

@bmayd I think the current design of sites opting into OS-level attribution leaves the door open to this design right?

However, synchronizing OS-supported attribution data across devices would require combining information across multiple independent device contexts, thus leaking information unique to each device to the other devices.

Can you say more about in which cases this is problematic?

bmayd commented 1 year ago

It has been a while since I wrote this, so doing my best to reconstruct my thinking.

Assuming there is a desire to support both cross-device and cross-app attribution scenarios, a potential mitigation strategy would be to allow campaigns to configure attribution to be either cross-device/intra-app (browser supported) or cross-app/inter-device (OS supported). In the former case, attribution logic would not need to be halted, in the latter it would.

What I meant to say here is cross-device/intra-app or cross-app/intra-device (not inter-device); i.e., the context is either the logged-in user's browser instances spanning devices or a specific device.

I think the current design of sites opting into OS-level attribution leaves the door open to this design right?

I think it may, but as I read the Cross App and Web Attribution Measurement proposal I don't get a clear sense of how reporting origins can determine whether a given campaign is using browser-level or OS-level attribution when receiving a request. It looks like it is possible for reporting origins within a redirect chain to respond inconsistently, some requesting browser-level and some OS-level, which I assume would either result in meaningfully different reports being provided to them or failures of reporting logic. It also suggests that a given reporting origin could respond inconsistently over the course of a campaign and it isn't clear how that would impact the OS and browser data stores within a device or what the implications would be for a browser that was attempting to sync attribution data across instances. Also not clear what the implications would be if data was sync'd between devices with inconsistent support for OS-level attribution (i.e. a desktop browser instance that supports OS-level and a mobile browser instance that doesn't).

I think having the attribution level defined per-campaign, providing an indication of the setting in the request and deciding if transitioning between levels could be supported and how, would allow the current design to support different attribution modes, but without those additions I think it could be very confusing.

Can you say more about in which cases this is problematic?

Good question; I'm not sure what I was thinking of specifically when I originally wrote the post, but I assume it was a combination of two things. The first is that I reflexively assume that when contexts are joined, the potential for information being leaked increases, so less cross-context communication is better. The second is that I think end users are unlikely to expect device specific information to be shared across devices: it is reasonable to expect and easy to understand that browser instances which are signed-into and syncing will be sharing attribution data. It is somewhat less obvious or expected that applications, including a browser, running on a device might share attribution data, but there is still the concept of it is what happens on this device. I think it is not expected that data from different apps on different devices would be combined and to the contrary, that most users would assume they were independent.

In which cases is this problematic? Good question and I don't know that I can point to one.

csharrison commented 1 year ago

Sorry for the delay on responding to this issue. I see your point about inconsistencies between OS delegation and web delegation. If you mess that up you could get confusing results. However, I'm not sure that the browser / OS keeping more state is the solution. This seems like something that is the responsibility of the reporting origin, not the browser.

In particular, I would really want to avoid a pattern where the browser needs to keep track of all ad engagements / conversions (for how long?) that were delegated to the OS previously, as that just duplicates storage across contexts and puts the burden on all users, rather than state being maintained in the reporting origin. i.e. the reporting origin could know at which times it began registering for OS-level attribution at a global level for a particular campaign.