WICG / attribution-reporting-api

Attribution Reporting API
https://wicg.github.io/attribution-reporting-api/
Other
367 stars 173 forks source link

Allow capping aggregated contributions for buckets #1396

Open feifeiji89 opened 3 months ago

feifeiji89 commented 3 months ago

Currently the API has L1 contribution budget for the aggregate reports per source. This means that you cannot control how the budget is distributed across multiple buckets. The bucket can be the SDKs used to report the triggers or any category defined by the filters in aggregatable_values.

E.g. if one SDK (for example Firebase) is used to report a conversion as biddable, yet another third party SDK using that same conversion as non-biddable. We cannot control which SDK ends up registering and redirecting to us first. This race condition may cause one SDK to consume all the L1 budget and leave the other SDK with no budget.

feifeiji89 commented 3 months ago

We may consider:

  1. Add the budget capacity for each bucket in the Attribution-Reporting-Register-Source header.

    Attribution-Reporting-Register-Source: {
    "aggregatable_bucket_caps": {
    "aap": 1024,
    "firebase": 1024
    }
    ...
    }
  2. Update the Attribution-Reporting-Register-Trigger header’s aggregatable_values to also include the bucket name for each value. Note we will need to support both array and dictionary version for aggregatable_values:

apasel422 commented 3 months ago

distributed across multiple SDKs

Could you clarify what an SDK is?

johnivdel commented 3 months ago

This seems specifically related to Android's implementation of Attribution Reporting, https://developers.google.com/privacy-sandbox/relevance/attribution-reporting/android/developer-guide#x-network-attribution.

Generically, the problem here seems to be that there are multiple trigger registrations going on for a source, and we would like to limit the amount of budget that some of these can use.

For example, let's say I wanted to ensure I only used at most 50% of my budget on page view conversions, and reserved the rest for other types.

Does this control need to be at the aggregation key granularity, or could we do it at the trigger level? For example:

Attribution-Reporting-Register-Source: {
  "trigger_contribution_budget_limit": {
    "page-view": 32768, // 2^15
    "other": 32768, // 32768
  }
  ...
}

then on the trigger side:

Attribution-Reporting-Register-Trigger: {
  "aggregatable_values": [...],
  "contribution_budget_name": "page-view",
  ...
}

At trigger time, we would look up the budget limit for this trigger, and then use that number when enforcing the L1 budget, perhaps the budget would also need to be tracked by budget limit which is slighly more complicated but seems reasonable.

This seems generic enough it would also work for the use-case above assuming we don't need to differentiate this by each individual key's contribution.

feifeiji89 commented 3 months ago

We did consider the option of trigger level bucket with this format.

Similar to aggregatable_deduplication_keys, we add another top level trigger field aggregatable_bucket, which selects the bucket based on the filter.

"aggregatable_bucket": [
    {
      "bucket_name": "biddable",
      "filters": {
        "2": [
          "11102626635",
          "11081876753",
          ...
        ]
      }
    }
  ],

I think the rationale is to provide more granularity, so don't have to support too many API surfaces as we already have 2 json formats for aggregatable_values and this might be more extensible to other usecases.

linnan-github commented 3 months ago

Another consideration is future extensibility and backwards compatibility. If we support trigger-level bucket for now, and later want to support use cases that need key/value level bucket, we would end up having two API surfaces for backwards compatibility with increased complexity.

johnivdel commented 3 months ago

I think there is definitely a tradeoff here in terms of API complexity and flexibility. From my point of view, the bucket level budgeting increases both the schema and API complexity quite a bit, given for a trigger there are a very large number of combinations of things that can happen at the bucket level.

In a world where we did want to support bucket level flexibility, I think there is still a case to be made for allowing trigger level caps as well: otherwise to get the same capability you need to more verbosely declare the same bucket on all values.

At the very least, we should have a strong example for how bucket level flexibility allows you to measure something useful that trigger level does not. Otherwise, I think trigger-level capping seems to better fit the complexity/flexibility curve.