WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
533 stars 236 forks source link

Creative pre-registration strategies #792

Open rdgordon-index opened 1 year ago

rdgordon-index commented 1 year ago

As suggested in the explainer, sellers have the ability to fetch additional real-time signals based on a combination of renderURL and hostname (representing the publisher’s domain) that can be used during scoreAd() when scoring creatives. Specifically:

Similarly, sellers may want to fetch information about a specific creative, e.g. the results of some out-of-band ad scanning system

Checking whether the creative contents have been pre-approved by the seller. This could be implemented by an out-of-band creative review process…

In today’s programmatic ecosystem, buyers communicate their creative markup via the bid.adm during RTB, alongside other key bid metadata (advertiser domain, seat ID, IAB category, creative format, creative & campaign identifiers, etc.); however, no creative URL (aka renderURL) containing the markup is provided. As a result, there is no existing mechanism by which SSPs can obtain this URL for all existing creatives submitted in contextual auctions.

This necessarily means that all existing creatives are unable to be served in PA auctions, since creatives have to be pre-approved in order to be scored with a desirability > 0. Otherwise, the rejectReason for all PA creatives returned by scoreAd() would be pending-approval-by-exchange.

This also necessitates a mechanism to initiate such PA creative registration via renderURL, which poses some challenges, as outlined below.

The most naïve such mechanism, available today, would leverage the forDebuggingOnly.reportAdAuctionLoss() endpoint – that is, for any renderURL not found in the seller’s K/V server, initiate an API call to a seller endpoint to indicate that said renderURL has not yet been approved. According to https://github.com/WICG/turtledove/issues/632#issuecomment-1631089708, this function will be available until the end of 3PCD, and should suffice for short-term testing as well as during the 1% 3PCD time horizon.

Challenges with this approach:

Another alternative approach would be to somehow leverage the Private Aggregation API, but this shares all of the challenges above, as well as it being unclear how to bucket the fields required for registration (e.g. renderURL, seat, adomain). Furthermore, this also requires the immediate adoption of this API (and its requirement for TEE) in order to be able to start registering creatives, and as such, this does not seem like a short-term solution.

MattMenke2 commented 1 year ago

One thing we need to be careful with here is about leaking data - renderURLs haven't been checked for k-anonymity, and so requesting them can leak data (e.g., if we send them only when offered in a bid, then they could pass in a user ID for the publisher page that could be correlated with a user ID on the joining origin. 32 IGs could provide one bit of publisher page ID each, like: https://foo.test/bit-0-is-1?user=FreddyPharkas, https://foo.test/bit-1-is-0?user=FreddyPharkas, etc. Each URL has the full user ID in the joining origin, and one ordered bit from the top-level-site where the auction is running).

Sending renderURLs on IG join would be more practical, but we don't know the seller origin to send the information to, and we'd need the IG to opt-in to sending the information (normally, offering a bid is considered to provide that permission).

So I think we need to figure out the privacy story here on how we can implement this without creating a new cross-top-level-origin information leak.

rdgordon-index commented 1 year ago

renderURLs haven't been checked for k-anonymity

Can you elaborate? As per https://github.com/WICG/turtledove/blob/main/FLEDGE.md#33-metadata-with-the-ad-bid, I wasn't expecting scoreAd() to ever receive a renderURL from generateBid() that didn't pass the k-anon check.

If generateBid() picks an ad whose rendering URL is not yet above the browser-enforced microtargeting prevention threshold, then the function will be called a second time, this time with a modified interestGroup argument that includes only the subset of the group's ads that are over threshold. (The under-threshold ad will, however, be counted towards the microtargeting thresholding for future auctions for this and other users.)

michaelkleber commented 1 year ago

Sending on IG join would be great, from the privacy POV. If only IGs declared which sellers they were willing to bit with, this would be the preferred approach. But that hasn't been a required part of IG metadata until now. I suspect that if we propose it we will hear push-back, but maybe I'm being too pessimistic? Roni, want to pop my bubble quickly?

Can you elaborate? As per https://github.com/WICG/turtledove/blob/main/FLEDGE.md#33-metadata-with-the-ad-bid, I wasn't expecting scoreAd() to ever receive a renderURL from generateBid() that didn't pass the k-anon check.

We do pass bids along to scoreAd() even if they are for ads below the k-anon bar — we need to do that, otherwise we could never learn whether they would have been the winner, which is the condition for warming up their k-anonymity count.

MattMenke2 commented 1 year ago

Only render URLs that win auctions (or rather, that would have won auctions) are registered with the k-anon server for the purposes of calculating k-anonymity, as otherwise, an ad could only be show to a single user, despite appearing in IGs for a lot of users. So if you're blocking ads that you've never seen before, they'll never reach the k-anon threshold. Therefore, this would need to be done for non-k-anon ads.

MattMenke2 commented 1 year ago

And just to be clear - I mean the ads need to have won the top-level auction, in an environment that doesn't know whether they've met the k-anon threshold or not. My understanding is that you'd want to know the URL so it can be scanned before showing it anywhere. If that's not the case, and this can all be done after the ad has hit the k-anon threshold and we've already started showing the ad to users, this becomes much easier to do. We may need some sort of k-anon <renderURL, seller, component auction bool> check on how often an ad has won auctions, and once it's hit, have some way of conveying it to sellers, whether directly, or through an aggregation server of some sort.

rdgordon-index commented 1 year ago

My understanding is that you'd want to know the URL so it can be scanned before showing it anywhere.

Correct.

check on how often an ad has won auctions, and once it's hit, have some way of conveying it to sellers

To be clear, there's no desire to trigger this registration under the k-anon threshold; in other words, if a creative won't be shown to N devices, then there's no need to register it "before" it reaches this threshold.

rdgordon-index commented 1 year ago

We do pass bids along to scoreAd() even if they are for ads below the k-anon bar

Only render URLs that win auctions (or rather, that would have won auctions) are registered with the k-anon server for the purposes of calculating k-anonymity

IMHO it isn't immediately obvious that from the explainer that this ever reaches scoreAd() -- though, upon further inspection, it's somewhat implied from this text in https://github.com/WICG/turtledove/blob/main/FLEDGE.md#12-interest-group-attributes (emphasis added):

The browser will provide protection against microtargeting, by only rendering an ad if the same rendering URL is being shown to a sufficiently large number of people (e.g. at least 100 people would have seen the ad, if it were allowed to show)

MattMenke2 commented 1 year ago

Agree that the explainer could be clearer on this point. I think this is the first case that's come up where the distinction really matters.

rdgordon-index commented 1 year ago

Sending on IG join would be great, from the privacy POV. If only IGs declared which sellers they were willing to bit with, this would be the preferred approach. But that hasn't been a required part of IG metadata until now. I suspect that if we propose it we will hear push-back, but maybe I'm being too pessimistic? Roni, want to pop my bubble quickly?

That would add complexity if an existing buyer/IG wanted to start working with a new seller, correct? Would updateURL allow for post-join updates to a seller list?

That being said, even if IG seller declaration were in place, that doesn't address the challenge of being able to leverage the metadata provided by generateBid() when registering these new creatives, as noted in the issue description. In other words, it's not solely about making renderURLs available off-device -- though that's definitely part of the challenge.

rdgordon-index commented 1 year ago

as otherwise, an ad could only be show to a single user, despite appearing in IGs for a lot of users

Just so that I fully understand the privacy concern -- doesn't that situation arise the first time the ad across the k-anon threshold already?

MattMenke2 commented 1 year ago

I think we could pass along renderURLs to new sellers when fetching the updateURL without any major new privacy issues, though that would potentially add a bunch of network requests and overhead (We'd need up update new sellers about renderURLs, and update old sellers about new renderURLs, so if we inform sellers directly from Chrome, that could be a lot of extra traffic).

I don't think sending extra metadata specified by the IG affects the privacy characteristics here if we send the information on join (as opposed to on win on a 3P site, where it would need to be added to the k-anon check, at least). We are putting more complexity and overhead on the browser here for something that the browser doesn't really need to care about, unfortunately. Ideally we'd keep the browser API surface for this as minimal as possible.

MattMenke2 commented 1 year ago

Just so that I fully understand the privacy concern -- doesn't that situation arise the first time the ad across the k-anon threshold already?

So, ideally the DSP and SSP don't know when the ad reaches the k-anon threshold for the first time, so can't alter behavior based on that. It can only get so much information from loss reports, and auctions are run in a manner that limits information it can get out of them. Only doing the k-anon counting after it wins the auction is done in part to protect against exactly that sort of gaming the system.

michaelkleber commented 1 year ago

I'm still wondering whether we can find a safe way to make this happen at IG Join and Update time. Really this is kind of about the browser mediating a direct flow of information from DSP to SSP, if both of them are OK with us doing so. I'm thinking something like:

(1) Suppose SSP X has run auctions in the past [period of time] in which X has invited DSP Y to be a buyer and some Y IG has placed a bid. (Each browser instance could keep track of this.)

(2) Suppose the IG object on which DSP Y calls Join includes a new field 'OkayToTellSellersMyAdUrls': true.

If both of those are true, then at the moment of IG Join, it seems to me like it would be OK for the browser to contact that SSP's KV server — if we knew the base URL somehow — and ask for the associated KV signals for each renderURL in the IG. And then if no KV signals came back, we could send the renderURL to some SSP-chosen scan-queueing endpoint, maybe as identified by the SSP in the KV response.

This could even be one instead of two round-trips, since I don't think there's any need for the first one to go to a trusted server, this is just a question of what endpoints are set up to receive a lot of traffic (KV expecting calls on each auction) vs only a little (scan-queueing expecting traffic only when a new renderURL appears).

rdgordon-index commented 1 year ago

and the renderURL need not utilize the buyer origin

As per the new guidance in https://github.com/WICG/turtledove/blob/main/FLEDGE.md#14-buyer-security-considerations :

the ads renderURLs should not be same-origin with the interest group’s owner

This confirms that there will be no a priori method to be able to associate a renderURL with a particular buyer (aka DSP), which is one of the challenges noted above.

pm-harshad-mane commented 1 year ago

This confirms that there will be no a priori method to be able to associate a renderURL with a particular buyer (aka DSP), which is one of the challenges noted above.

In this situation, DSPs should tell SSP partners which domain they will use in the renderingURL so that SSP can keep track of it on their KV server to recognize the DSP partner from the renderingURL.

rdgordon-index commented 1 year ago

DSPs should tell SSP partners which domain they will use in the renderingURL

Agreed -- but it's also not clear that there will only be a single such render domain per DSP.

JoelPM commented 1 year ago

[I read through all the comments and think I understand what's being discussed/proposed, but apologies in advance if I rehash something or miss a point already made.]

I think @michaelkleber is on the right track when he says that we're looking for someone to mediate between the SSP and the DSP. However, the challenge with having it be the browser was already pointed out by @rdgordon-index in the initial description, as I think this still results in a "significant volume of unregistered created API calls from each device, for each such creative."

Could the K/V server be the point of coordination? It could provide an endpoint that can be queried for a list of renderURLs that have no data associated with them. It's effectively a list of cache misses. When the endpoint gets queried, it could take the extra step of filtering by checking which keys are still misses, though it doesn't have to.

Depending on how lookups get distributed geographically, it might help segment the data by region (assuming K/V servers are deployed in multiple regions and probably see different keys). This could help SSPs know which values need to be pushed to which K/V servers.

rdgordon-index commented 1 year ago

https://developers.google.com/display-video/protected-audience/ssp-guide#metadata_with_ad_bid -- some recent updates from DV3 regarding ad metadata

orrb1 commented 9 months ago

Hi all. We've been exploring this issue, and have prepared a document that details a proposed solution, including a chronicle of several options considered and their respective pros/cons. Please take a look: https://docs.google.com/document/d/1s0tTN25AiPwl3ocCFYOLqeKhetZCt_YFIYQEQ7wzHqI/edit?usp=sharing

Thanks.

orrb1 commented 9 months ago

Given the length of the document linked above, I believe it would be helpful to convey a high-level summary of that document here.

The design expressed in this document attempts to balance a few competing objectives:

  1. Ensure that all ads that a seller may be asked to score have been sent to that seller for creative scanning
  2. Don't overload sellers' servers with a firehose of ads to creative scan, and in particular avoid sending the same ad many more times than needed
  3. Minimize the privacy impact of sending ads for creative scanning

To this end, the design proposed has the following properties. The document explains each of these properties and their motivation in far greater detail.

The majority of the document focuses on the question of when the browser would send ads to sellers' creative scanning entrypoints. The design alternative the document recommends proposes sending the ads of an interest group anytime that interest group is joined or updated, except that the browser would also keep track of which ads it already sent to each seller, so that it could reduce the volume of traffic sent to sellers' creative scanning entrypoint by sending each ad to each seller only once. To protect privacy, this deduplication would be partitioned by the joining site of the interest group.

Please see the document for more details, and provide your comments here on GitHub issue. Thanks.

rdgordon-index commented 9 months ago

https://www.example-ssp.com/.well-known/protected-audience-creative-scanning?renderURL=<URL-encoded-renderURL>&metadata=<URL-encoded-metadata>

question: can we include interestGroupOwner as well?

rdgordon-index commented 9 months ago

Though this approach relies on buyers explicitly enumerating all of the sellers with whom they participate in auctions, other approaches for determining the list of target sellers - e.g. having the browser remember, at auction time, which buyers participated in that auction - cause a potential leak of cross-site identity via the seller domain. Mitigating this leak requires an allowlist of sellers, making these approaches redundant, and leaving us no better option than the explicit enumeration.

Can you elaborate on the "potential leak" here? A buyer submitting a bid is effectively "allowing" the seller to scan their creatives.

orrb1 commented 9 months ago

https://www.example-ssp.com/.well-known/protected-audience-creative-scanning?renderURL=<URL-encoded-renderURL>&metadata=<URL-encoded-metadata>

question: can we include interestGroupOwner as well?

Yes, that sounds like a good idea. I've modified the document to reflect it. I've also added a change log at the bottom of the document to record any changes made from when the document was first posted here.

Though this approach relies on buyers explicitly enumerating all of the sellers with whom they participate in auctions, other approaches for determining the list of target sellers - e.g. having the browser remember, at auction time, which buyers participated in that auction - cause a potential leak of cross-site identity via the seller domain. Mitigating this leak requires an allowlist of sellers, making these approaches redundant, and leaving us no better option than the explicit enumeration.

Can you elaborate on the "potential leak" here? A buyer submitting a bid is effectively "allowing" the seller to scan their creatives.

This is a good question. The privacy risk described here would not be part of normal operation, but a malicious party could cause a leak in the following way. An auction is run on the user's device for which the seller is userID_on_publisher.adtechB.com, which the browser would remember had participated in an auction with a given buyer. At a later time, that buyer joins the user to an interest group for which an ad's renderURL is adtechA.com/userID_on_advertiser. The browser would then send a creative scanning request https://userID_on_publisher.adtechB.com/.well-known/protected-audience-creative-scanning?renderURL=adtechA.com/userID_on_advertiser&.... Attestation is insufficient to protect against this because it's enforced at eTLD+1 so that userID_on_publisher.adtechB.com would be allowed to run an auction under the attestation for adtechB.com. This is the reason why the browser can't automatically remember the seller-buyer mapping, and the design instead relies on buyers explicitly enumerating the sellers for which their ads should be sent for creative scanning.

rdgordon-index commented 9 months ago

Attestation is insufficient to protect against this because it's enforced at eTLD+1

Technically true, but attestation also requires the ad tech vendor to indicate that they're not going do this kind of thing -- and because it's on the adtechB.com, that means that this ad tech would be in violation of their own attestation to the contrary.

dmdabbs commented 9 months ago

Thanks for the well-written and thought out proposal, @orrb1. I have comments to post and find myself wanting to comment in situ such as on a PR, versus copy/paste/formatting the context into comment(s) in this issue. Could the external doc be converted to a PR to, say, a "proposals" folder markdown doc?

dmdabbs commented 9 months ago

Buyers could specify the list of sellers to which they would want to send their ads for creative scanning by exposing an entrypoint at another well-known URI. The browser would issue a GET request to the buyer's server, e.g. https://www.example-dsp.com/.well-known/protected-audience-creative-scanning-buyer-config

Suggestion to please use a consistent root path component for all Protected Audience .well-known URIs as Attribution Reporting has. We find this helpful for request routing.

dmdabbs commented 9 months ago

Buyers' Config Publishing

Since Chrome proposes to commit to fetching and persisting the new creative scanning config and you intend to extend sellerCapabilities, WDYT of generalizing this endpoint to publish using the single config scheme and affording future needs?

For example, https://www.example-dsp.com/.well-known/protected-audience/buyer-config

{
   "sellerCapabilities": {
     "https://seller1.com": { "creative-scanning" },
     "https://seller2.com": { "latency-stats", "creative-scanning" },
     "https://seller3.com": { "latency-stats" },
     "*": { "interest-group-counts" }
   }
}

An interest group can override the settings on any IG join, otherwise these are used. Same caveat mentioned in the explainer applies, that creative scanning cannot have a catch-all. The nifty new scanning declarations are yet another thing to hang onto every IG registration that counts against the size constraints. In for a penny, in for a pound?

dmdabbs commented 9 months ago

Seller Configs

Same might apply on the seller side along with picking up the perBuyerXXX keyed dict pattern from auctionConfig.

For example, https://www.example-ssp.com/.well-known/protected-audience/seller-config

{
     "perBuyerCreativeSampling": {
       "https://www.example-dsp.com": { "sampling_rate": 1},  // Send all ads from this buyer
       "https://www.another-dsp.com": { "sampling_rate": 0},  // Don't send any ads from this buyer 
       "*": { "sampling_rate": 0.1}                           // Send 10% of ads from other buyers
     }     
     etc...
}
dmdabbs commented 9 months ago

Submitting Creatives

GET https://www.example-ssp.com/.well-known/protected-audience-creative-scanning?renderURL=<URL-encoded-renderURL>&interestGroupOwner=<URL-encoded-interest-group-owner>&metadata=<URL-encoded-metadata>

Chrome isn't consuming anything from the response, right? You can ditch the URL encoding by POSTing,

POST https://www.example-ssp.com/.well-known/protected-audience/creative-scanning

{
   "https://www.example-dsp.com": [
      {
        "renderURL": "https://some-adserver.com/...",   
        "metadata": {the renderURL's associated creativeScanningMetadata}
      }
   ]
}

Also free to send multiple creatives identified at the IG joining site as suggested in your perferred Option 2b.

rdgordon-index commented 9 months ago

question: when would creativeScanningHistory table be purged? Any relationship to how/when IGs are cleared?

dmdabbs commented 9 months ago

question: when would creativeScanningHistory table be purged? Any relationship to how/when IGs are cleared?

Not sure if this is where you were going @rdgordon-index, but statements like

since the creative scanning entrypoint would need to see an ad only once

had me wondering what frequency sellers get to re-review renderURLs. In today's workflows I'm familiar with, if our fetch url is active our partner re-verifies it.

rdgordon-index commented 9 months ago

had me wondering what frequency sellers get to re-review renderURLs

Somewhat depends on which Option is under consideration; indirectly, in Options 1 & 2, for example:

the browser would occasionally send an ad that does have an associated entry in the creativeScanningHistory

Which is some form of re-scanning, albeit indirectly -- I was asking about the explicit ability to do so.

dmdabbs commented 9 months ago

The maxTrustedBiddingSignalsURLLength currently being plumbed would be handy to specify as a 'global' IG/buyer config attribute (should the buyer-config accommodate knobs beyond seller scanning). Same if sellers will also get to limit scoring URL length.

orrb1 commented 9 months ago

Thank you both, Roni and David, for your thoughtful feedback. I'll try to answer each of your points below.

Technically true, but attestation also requires the ad tech vendor to indicate that they're not going do this kind of thing -- and because it's on the adtechB.com, that means that this ad tech would be in violation of their own attestation to the contrary.

Though this is true, the Protected Audience API has a precedent of enforcing with technical restrictions what can be enforced, and relying on policy where that isn't possible.

Could the external doc be converted to a PR to, say, a "proposals" folder markdown doc?

We considered this among other options for posting this design and getting feedback. The goal was specifically to encourage most of the conversation to stay in this thread so that anyone who's interested can stay involved.

Suggestion to please use a consistent root path component for all Protected Audience .well-known URIs as Attribution Reporting has.

That's a good idea. There's an existing prefix for permission delegation, as described in the explainer, which we can use here as well. I've updated the well-known URIs in this design to be:

Since Chrome proposes to commit to fetching and persisting the new creative scanning config and you intend to extend sellerCapabilities, WDYT of generalizing this endpoint to publish using the single config scheme and affording future needs?

The concern here would be in determining what to do if there's a network error while trying to fetch the buyer config. At interest group join time, we'd have only a partial interest group, and that group may have trouble participating in auctions on that device, for example, in auctions that have required seller capabilities. Providing everything inline protects against that. Having just the sellers for creative scanning in a buyer config that needs to be fetched is an acceptable risk, since, even if the buyer config fetch fails, the interest group can still participate in auctions, and presumably the buyer config fetch will succeed on another device, which will send that buyer's ads for creative scanning.

Same might apply on the seller side along with picking up the perBuyerXXX keyed dict pattern from auctionConfig.

The issue with combining perBuyerCreativeSampling together with other perBuyerXXX fields is that they're used at different times. In most of the design options listed in this doc, creative scanning happens at interest group join/update time, when there isn't an available auction config. The other perBuyerXXX fields are all used at auction time, when there is an available auction config. (There are a few options that happen at auction time, but those notably do not rely on perBuyerCreativeSampling, because the seller can dictate - via one of their return values from scoreAd() - which ads should be sent for creative scanning.)

Chrome isn't consuming anything from the response, right? You can ditch the URL encoding by POSTing,

Yes, there seem to be some compelling benefits to using a POST here. I've updated the document to use a POST instead of a GET for the creative scanning entrypoint.

Question: when would creativeScanningHistory table be purged? Any relationship to how/when IGs are cleared?

Entries in the creativeScanningHistory will be purged when interest groups are cleared.

What frequency do sellers get to re-review renderURLs. In today's workflows I'm familiar with, if our fetch url is active our partner re-verifies it.

Could you clarify this? I had envisioned the creative scanning problem as a "discovery" problem. Once the seller knows about an ad, is there any reason it couldn't reverify that ad anytime it wanted to?

From my perspective, an ad repeatedly sent to a seller's creative scanning entrypoint was a thing to be avoided because it contributed unnecessary load to the entrypoint. Still, in most of the options, an ad will likely be sent many times throughout its use. In options 3, 5, 6, and 7, a seller can either explicitly request that an ad be sent to their creative scanning entrypoint at any time. In other options, e.g. options 2 and 2a, other devices would send that ad, so sellers would get an opportunity to re-verify anytime a new device joins that interest group.

The maxTrustedBiddingSignalsURLLength currently being plumbed would be handy to specify as a 'global' IG/buyer config attribute (should the buyer-config accommodate knobs beyond seller scanning). Same if sellers will also get to limit scoring URL length.

This seems like a new idea that's distinct from creative scanning. If you'd like to explore this further, could you please file a new issue for further discussion? Thanks.

dmdabbs commented 9 months ago

Providing everything inline protects against that.

Yes after posting I realised that. You want the IG in a ready-to-go state in the IG cache, sans any 'assembly.'

Could you clarify this? I had envisioned the creative scanning problem as a "discovery" problem. Once the seller knows about an ad, is there any reason it couldn't reverify that ad anytime it wanted to?

Yes. Good point. Up to sellers when to age off discovered renderURLs.

This seems like a new idea that's distinct from creative scanning. If you'd like to explore this further, could you please file a new issue for further discussion? Thanks.

Indeed it was. I'll post something separate from this thread. Thanks.

dmdabbs commented 9 months ago

@dmdabbs: Same might apply on the seller side along with picking up the perBuyerXXX keyed dict pattern from auctionConfig. @orrb1: The issue with combining perBuyerCreativeSampling together with other perBuyerXXX fields....

Re-reading your response on the train I see that "picking up the perBuyerXXX pattern" could have been clearer. I wasn't advocating picking up unrelated perBuyerXXX, only their slightly more consise representation ('pattern'). Perhaps a foolish consistency on my part:

This

{
  "perBuyerSamplingRates": [
    {"interest_group_owner": "https://www.example-dsp.com","sampling_rate": 1},
    {"interest_group_owner": "https://www.another-dsp.com","sampling_rate": 0}
  ],
  "defaultSamplingRate": 0.1
}

compared to

{
  "perBuyerSamplingRates": {
    "https://www.example-dsp.com": { "sampling_rate": 1},
    "https://www.another-dsp.com": { "sampling_rate": 0},
    "*": { "sampling_rate": 0.1}
  }
  etc...     
}

where the map pattern obviates the "interest_group_owner" and "defaultSamplingRate" labels. It's the OpenRTB background - looking for a concise representation to reduce network bytes. The 'etc...' was to accommodate future, appropriate attributes.

orrb1 commented 9 months ago

@dmdabbs: Same might apply on the seller side along with picking up the perBuyerXXX keyed dict pattern from auctionConfig. @orrb1: The issue with combining perBuyerCreativeSampling together with other perBuyerXXX fields....

Re-reading your response on the train I see that "picking up the perBuyerXXX pattern" could have been clearer. I wasn't advocating picking up unrelated perBuyerXXX, only their slightly more consise representation ('pattern'). Perhaps a foolish consistency on my part:

This

{
  "perBuyerSamplingRates": [
    {"interest_group_owner": "https://www.example-dsp.com","sampling_rate": 1},
    {"interest_group_owner": "https://www.another-dsp.com","sampling_rate": 0}
  ],
  "defaultSamplingRate": 0.1
}

compared to

{
  "perBuyerSamplingRates": {
    "https://www.example-dsp.com": { "sampling_rate": 1},
    "https://www.another-dsp.com": { "sampling_rate": 0},
    "*": { "sampling_rate": 0.1}
  }
  etc...     
}

where the map pattern obviates the "interest_group_owner" and "defaultSamplingRate" labels. It's the OpenRTB background - looking for a concise representation to reduce network bytes. The 'etc...' was to accommodate future, appropriate attributes.

Ah, sorry for the misunderstanding. It makes a lot of sense to use a format that's consistent with existing parameters. I've updated this in the document. Thanks.

rdgordon-index commented 9 months ago

A few additional comments in advance of the WICG meeting:

I'm aligned that Options 1, 3, 5 and 7 are less desirable; and 2 is preferable to 2b from a seller workload perspective.

orrb1 commented 8 months ago

Hi everyone,

Thank you for all of your feedback on the document and proposals. Based on that feedback, we've made several changes to the design reflected in the document and described below. We've also changed the structure of the document to reflect the current recommended design, while moving the other options explored into an "Alternatives Considered" section. Please continue to provide us with feedback as we continue to explore potential solutions for supporting creative scanning with the Protected Audience API.


From the notes:

(Patrick McCann) Would be better fit for purpose if the top level seller could identify the creative scanner instead of the component sellers scanning the ads?

In the current recommended design, the owner of the interest group can indicate which attested parties should be notified of new ads. This can absolutely include the top-level seller. We've updated the design so that this seller can explicitly indicate the creative scanner. They would do this using a new creativeScanningURL in the seller config exposed at their .well-known URI (e.g., https://www.example-ssp.com/.well-known/interest-group/creative-scanning-seller-config). (This replaces the previous behavior in which the seller would expose a second .well-known URI as the target of the creative scanning request.) From the updated document: "The seller’s creative scanning entrypoint indicated in the seller config does not need to be hosted at the seller’s origin, and the seller can choose to send ads for creative scanning directly to a third party vendor that specializes in ad quality."


Patrick McCann and Laurentiu Badea both asked about having the Trusted Scoring Signals Server keep track of which ads had no corresponding signals - a sign that these ads had not been previously scanned - and expose those via an endpoint. Laurentiu pointed to Joel's prior comment on this issue. From Joel's comment:

Could the K/V server be the point of coordination? It could provide an endpoint that can be queried for a list of renderURLs that have no data associated with them. It's effectively a list of cache misses. When the endpoint gets queried, it could take the extra step of filtering by checking which keys are still misses, though it doesn't have to.

We've added this idea as a new "Option 8" in the alternatives considered section of the document. Copying from the analysis provided there:

If all Trusted Scoring Signals Servers were running in TEEs, a design like this could work for creative scanning while still preserving privacy. In order to mitigate the privacy risk incurred by allowing for the exfiltration of ad URLs that could potentially be used to expose a user's cross-site identity, the Trusted Scoring Signals Server could aggregate "cache misses" and then, after a delay (e.g. once a day), expose only those that have been reported by at least k devices, enforcing a k-anonymity threshold for creative scanning that would help mitigate the privacy risk. However, for this to work, the Trusted Scoring Signals request would need to include an identifier for that device, which is a privacy risk while Trusted Scoring Signals Servers still run outside of TEEs. We’re continuing to explore whether this offers a feasible solution in the short-term.


From the notes:

(David Dabbs) Component ads are a new thing, there will be some vendors that use this, and you would get some markup blob to scan but you would need to talk to the seller - effectively you are submitting markup

This is a fair point, as the browser currently fetches trusted scoring signals for component ads as part of the same request that fetches trusted scoring signals for ads. The design has been updated to reflect that the renderURL and creativeScanningMetadata for each component ad would be sent for creative scanning alongside the renderURL and creativeScanningMetadata for each ad.


From the notes:

Pat McCann: Can you describe more about 5 - is it more expensive with higher quality?

Option 5 is more expensive without any benefit in quality. This option explored the question of whether the trusted scoring signals server could be used to indicate whether an ad should be sent to the creative scanning entrypoint. The conclusion of that exploration was that using the trusted scoring signals server to, in effect, triage the ads and determine which should be sent to the creative scanning entrypoint was inefficient. Assuming that the creative scanning entrypoint would be less expensive than the trusted scoring signals server, making a request to that more expensive trusted scoring signals server only to determine whether or not to make a request to the less expensive creative scanning entrypoint would be inefficient.


From the notes:

Stan Belov: Was a discussion from the private aggregation api to use this for the trusted scoring signals - issue was that the map is mapped to a 128bit design which does not know about the creative ids etc. Have you thought about extending the private aggregation api?

The Private Aggregation API doesn't seem to be a good match for conveying arbitrary renderURLs. The aggregation key in an Private Aggregation API event is limited to 128 bits. As such, it could be used to convey the hash of a renderURL, but without knowing a priori what the set of all possible renderURLs could be, we couldn't convert that hash back to a renderURL.


From the notes:

David Dabbs: The simplest approach here is the industry solves this and buyers submit to sellers etc

The current proposed design provides a mechanism that could be replicated by buyers sending their creatives directly to creative scanners. Building support as part of the Protected Audience API aims to establish a set of protocols to make that process easier.


Roni Gordon: Option 4. Can you elaborate on the possibility of the browser maintaining a hash by seller + renderURL + k-anon status? Would this address the case of a new seller coming on board for an existing k-anon renderURL, or a long-lived creative that is below the threshold?

Though this would address the issues you described, the effect would be to make this option identical in its behavior to option 2. The reason for this is that, at an individual device, if the ad first arrives when it's already k-anonymous, the browser wouldn't know to which sellers the ad had been sent from other devices before it was k-anonymous. As a result, each browser would fall back to sending each ad to each seller for each new ad, and potentially a second time if first sent before it was k-anonymous.


Roni Gordon: Option 6. The trustedScoringSignalsURL endpoint already is not supposed to store any state -- and we've attested to that -- and in the future, the TEE will guarantee this. If so, can you elaborate on the privacy risk here? The scanning endpoint doesn't receive cross-site information in the QSPs -- so I'm curious about the nature of the leak

Though the TEE provides a guarantee that the trusted scoring signal server won't be able to exfiltrate any information by itself, the control it has over whether an ad is sent to creative scanning servers would provide it with a mechanism for exfiltrating a small amount of information. The trusted scoring signal server potentially has access to multiple sites’ worth of information - context from the publisher site and renderURLs from advertiser sites. If the trusted scoring signal server intentionally selected a subset of ads to be sent to the seller's creative scanning entrypoint, these could be used to reconstruct a user's cross-site identity.


Roni Gordon: Scanning Rate is difficult to manage based on number of devices and IGs, renderURLs - this is a moving target, especially given the rate and scale of joinAdInterestGroup calls today. Is this intended just to control the firehose? Is there a way to ensure that we see all renderURLs without constant tuning?

The sampling rate defined in the seller config is an optional configuration that sellers may use to tune the rate of traffic as they see fit. If no per-buyer sampling rates are provided, the default sampling rate assumed is 1.0, so that all ads are sent to the sellers' creative scanning entrypoint. To ensure that they see all renderURLs, a seller may choose to maintain a sampling rate of 1.0 and, as noted in the document, efficiently shed previously discovered renderURLs at their creative scanning entrypoint.

dmdabbs commented 8 months ago

Appreciate the follow-ups to address earlier threads, @orrb1, and the updated written spec proposal.

From your doc:

The browser would send an ad's renderURL for creative scanning.

A number established features and emerging proposals concern renderURLs:

  1. The specified & implemented, but not yet required, creative size declarations factoring into k-anonymity
  2. The deprecatedRenderURLReplacements work that is underway
  3. this "renderURL scanning support" proposal
  4. Reducing interest group payload by compressing renderURLs #1076, @ardianp-google requested a few days ago
  5. Multi-bid support that is underway
  6. Video and native delivery approaches on which folks are currently iterating

Regardless of how these chips land, I presume that the constraint will remain that a bidder/buyer will not be permitted to submit novel "render URLs"; they must be recognizable as present in the IG on device.

On #1. the explainer says,

width: The creative's width. This size will be matched against the declaration in the interest group and substituted into any ad size macros present in the ad creative URL.

Does this mean that the "creative url" supplied to the seller will have AD_WIDTH & AD_HEIGHT replaced as Chrome does prior to navigating?

On #2, Same. Will these be substituted prior to sending to seller(s)?

On #4 Basically the same. Chrome should 'instantiate' the template to a string prior to supplying to seller, yes?

On #5 Does Chrome submit to sellers all the creatives submitted by the seller or just the winner?

On #6 Today PA only does banners. Once you teach it about newfangled formats like video &c, the POST to the seller will need some signal indicating what media rendering use case the render URL entity is for.

buyers could provide a set of metadata fields, e.g. a domain and seat, that would be sent alongside the renderURL in support of creative scanning. A buyer would provide these using a new creativeScanningMetadata optional property

Some of these are in discussion for buyers to provide to sellers in the bid ad metadata attribute. Would be nice to avoid duplication.

Buyers would explicitly specify the list of sellers by serving a “creative scanning buyer config” at a well-known URI.

Chrome has or will mitigate attestation file availability by downloading these via some Chrome component. Wondering how to keep this file/fetch from experiencing similar issues. Can one assume that no buyer creatives will be shared if there is not a cached resource available? Also the fetch will be out of the critical path, yes?

Each seller would expose a “creative scanning seller config” at a well-known URI.

Same here.

To do so, the browser would collate all of the ads of an interest group sent to a given seller and send these as a POST request of the form

Is this answering Yes to the question above regarding multibid submissions?

'renderUrl': shoesAd1,

Suggest using realistic, illustrative URLs. | Nit (renderUrl->renderURL)

rdgordon-index commented 7 months ago

if the ad first arrives when it's already k-anonymous, the browser wouldn't know to which sellers the ad had been sent from other devices before it was k-anonymous

Can you clarify why the browser would need to know about what's happening on other devices in this case (for Option 4)? If the hash includes seller, it should already know what the 'new seller' is -- and the existing sellers are already locally stored in the cache.

rdgordon-index commented 7 months ago

If the trusted scoring signal server intentionally selected a subset of ads to be sent to the seller's creative scanning entrypoint

Can you elaborate on the nature of this "intention"? By definition, only ads that need to be re-scanned, or aren't already scanned, would be sent to the creative scanning endpoint -- so how is this any different?

rdgordon-index commented 7 months ago

To ensure that they see all renderURLs, a seller may choose to maintain a sampling rate of 1.0 and, as noted in the document, efficiently shed previously discovered renderURLs at their creative scanning entrypoint.

I don't think that's a viable solution -- that's an enormous amount of network traffic simply to discard it at the entrypoint.

orrb1 commented 6 months ago

@rdgordon-index - I have a couple of small follow questions regarding your initial comment on this issue. If given both the renderURL and the buyer origin, would it be possible to infer the other key signals needed for creative scanning? Specifically, could adomain be determined from either response headers returned from the ad server or by rendering the creative, and could seat be inferred using the renderURL, buyer origin, and adomain?

rdgordon-index commented 5 months ago

If given both the renderURL and the buyer origin, would it be possible to infer the other key signals needed for creative scanning?

Would definitely be valuable to include a link between renderURL and buyer origin, since they aren't supposed to be same, as per https://github.com/WICG/turtledove/blob/main/FLEDGE.md#14-buyer-security-considerations, so this way, we would be able to definitively associated renderURLs to a particular buyer -- today, it's implicit, and works primarily because buyers are still using their origin as their renderURL parent domain (as that security consideration was a later addition to the markdown file).

Specifically, could adomain be determined from either response headers returned from the ad server or by rendering the creative, and could seat be inferred using the renderURL, buyer origin, and adomain?

For response headers -- are you thinking about something like https://developers.google.com/authorized-buyers/rtb/protected-audience-api#automatic_creative_scanning ?

'returned from the ad server' -- https://github.com/WICG/turtledove/issues/1028 talks about some of the challenges and assumptions as to whether or not the renderURL 'ad server' would have the full awareness of these parameters -- but assuming so, then yes, in principle, it could be scanned as part of the creative registration process. How would you solve for the for AD_WIDTH and AD_HEIGHT macros that were added as part of https://github.com/WICG/turtledove/pull/417, since the K/V call doesn't have access to these?

or by rendering the creative

Typically this would involve support for some sort of 'creative audit' flags to ensure that the renderURL is able to be fetched by creative scanners -- there are complications for geography and client-side IP expectations, for instance, that often come into play. In practice this also means crawling the rendered creative to determine adomain (a.k.a. landing pages), which isn't always straight-forward -- so a declarative approach, like response headers, is preferred IMO.

omriariav commented 2 months ago

@orrb1 @michaelkleber Taboola team (@vladimanaev and @razkliger) finished reviewing the proposal and here are our comments.

We graded the eight proposals in a scale of one to eight, where one is the most bad for us, and eight is the most fit for our needs.

As a reminder in native we have endless opportunities for auctions - need to run them/aware of them in the same time, need for look and feel, and higher demand for ad quality functionality

Proposal Comments Grade
Rely on the trusted scoring signals server to track and expose new ads the trusted scoring signals server maintain a record of renderURLs queried at auction time that have no associated data. We can scan those item offline. 8
Reuse the existing call to the trusted scoring signals server by waiting until an auction to send an ad for creative scanning For us is the best option to decide it in the scroeAd. 8
Reuse the existing auction-time call to the trusted scoring signals server, but only send ads that are k-anonymous Because privacy can be saved, so it may be a better solution. But it can cause us to scan more. The problem is also the delay of the data 7
Send all ads during interest group join/update The scale and the required infra to support it is too high and costly for us 1
Send only those ads not previously sent from this device Even with client side caching we think the scale and the required infra to support it is too high and costly for us 2
Allow the seller to convey which ads it's already seen We don’t think using the bloom-filter is feasible in the client side, and regardless we will need to query this making in too big scale 4
Use k-anonymity as a proxy for ads that a seller has not yet seen The problem is the delay until we scan the item and some items will keep send to us even if we scan it in the past but stay under the threshold 5
Call trusted scoring signals server during join/update The seller can send signals if he “wants” to scan the creative. But it sounds like we need to scan each one. Still the scale will be too big 3

We will be happy to engage and colaborate on this moving forward

orrb1 commented 1 month ago

Thank you, everyone, for your patience as we've explored in depth how Protected Audience can support creative scanning while advancing privacy and conserving resources. Previously, we had looked at several browser-mediated options, as outlined in this doc, but found that each of these options was infeasible due to privacy and/or resource concerns. We decided instead on an approach that reuses preexisting PA infrastructure. Here's what we currently propose:

Today, sellers may choose to run a key/value service that allows the auction to retrieve real-time signals before the ad is scored by the seller. For the time being, these are run on untrusted servers, referred to as BYOS (Bring Your Own Server). At auction time, the browser issues a series of requests to these key/value services. Each request conveys one or more renderURLs in plain text, and the key/value service returns signals associated with each of those renderURLs. The key/value service request includes renderURLs for both ads and component ads. This exchange is described in more detail in section 3.1 of the explainer, in particular the paragraph beginning with, "Similarly, sellers may want to fetch information about a specific creative, e.g. the results of some out-of-band ad scanning system."

We know that some sellers to date have relied on forDebuggingOnly (fDO) APIs to discover renderURLs for creative scanning. This flow will become ineffective on devices on which fDO is downsampled. We propose that sellers instead use their BYOS real-time scoring signals key/value service as a source of renderURLs for creative scanning while these services are BYOS-hosted. Some have noted that the key/value service request lacks metadata associated with each renderURL that's needed for creative scanning. To accommodate this, we would add a new string-typed creativeScanningMetadata field to the ads structure within each interest group.


const myGroup = {
  'owner': 'https://www.example-dsp.com',
  'name': 'womens-running-shoes',
  'ads': [
    {
      'renderUrl': shoesAd1,
      'sizeGroup': 'group1',
      'metadata': { ... },
      'creativeScanningMetadata': "..."
    },
  'adComponents': [
    {
      'renderUrl': runningShoes1,
      'sizeGroup': 'group2',
      'metadata': { ... },
      'creativeScanningMetadata': "..."
    },
  'adSizes': {'size1': {width: '100', height: '100'}},
  'sizeGroups:' {'group1': ['size1']},
  ]
};
const joinPromise = navigator.joinAdInterestGroup(myGroup);

If the seller would like the creativeScanningMetadata sent alongside the renderURLs in their BYOS-hosted key/value service requests, they can indicate this using a new boolean-valued sendCreativeScanningMetadata field in the auction config. Auctions configured this way would include the URL-encoded creativeScanningMetadata in the BYOS-hosted key/value service request URL using a new adCreativeScanningMetadata query parameter.

Similarly, some have noted that the ad size is also necessary in order to scan a creative. Ad size is returned by generateBid() alongside the ad's renderURL. Auctions configured with sendCreativeScanningMetadata would also include the ad size in the BYOS-hosted key/value service request URL using a new adSizes query parameter. Each ad's self-declared width and height will be sent, if the bid included those, formatted as the URL-encoded width, followed by a comma, followed by the URL-encoded height. For ads that don't declare width and height, their ad size in the adSizes query parameter will be a single comma — so that this query parameter is easier to parse — with no other information.

The key/value service request includes renderURLs for both ads and component ads. For component ads, we would also add a new creativeScanningMetadata field to the adComponents structure within each interest group. Auctions configured with sendCreativeScanningMetadata would include, for component ads, the URL-encoded creativeScanningMetadata in the key/value service request using a new adComponentCreativeScanningMetadata query parameter, and the formatted ad size using the adComponentSizes query parameter.

The adCreativeScanningMetadata, adComponentCreativeScanningMetadata, adSizes, and adComponentSizes query parameters would all be comma-separated lists of their respective metadata. Each entry in the adCreativeScanningMetadata and adSizes parameters would correspond to an entry in renderUrls, and each entry in the adComponentCreativeScanningMetadata and adComponentSizes parameters would correspond to an entry in adComponentRenderUrls.

In total, the URL for the browser's request to the the BYOS-hosted scoring signals key/value service for an auction configured with sendCreativeScanningMetadata would have six sets of keys: renderUrls=url1,url2,... and adComponentRenderUrls=url1,url2,..., as currently described in the Protected Audience explainer, and also adCreativeScanningMetadata=metadata1,metadata2,..., adComponentCreativeScanningMetadata=metadata1,metadata2,..., adSizes=width1,height1,width2,height2,..., and adComponentSizes=width1,height1,width2,height2,....

In the future the key/value service will be required to run in a trusted execution environment (TEE) to ensure that the user's data is kept private. We've been working on a long-term design that provides an aggregated stream of renderURLs and their associated metadata and sizes for creative scanning. Unlike the BYOS-based solution, which exposes each renderURL without any indication of its desirability, for this long term solution we're looking at ways to emit only the most valuable renderURLs as defined by the seller, providing a more focused stream of ads for scanning. Each renderURL will be sent for creative scanning only after that renderURL has met a privacy bar, for example, having been observed on multiple devices. As such, it's recommended that sellers using the BYOS key/value service approach described above similarly scan renderURLs only after they've been seen multiple times to ease future transitions into the long term privacy advancing state. We will provide a timeline to transition to this privacy improving approach in a future update.

In the meantime, please provide us with feedback on whether the BYOS key/value service-based solution described above would support your creative scanning needs. Thanks so much.

rdgordon-index commented 1 month ago

please provide us with feedback on whether the BYOS key/value service-based solution described above would support your creative scanning needs

Thanks for the additional details regarding the KV-initiated creative registration proposal.

Each entry in the adCreativeScanningMetadata and adSizes parameters would correspond to an entry in renderUrls

Earlier, we talked about the challenge of not knowing which DSP's buyer origin corresponds to the renderURL -- will buyer origin be sent for each renderURL as well?

Regarding adCreativeScanningMetadata -- on other GH issues and WICG discussions, there were some concerns from buyers that some of this metadata is seller-specific -- seat being a good example -- I'm curious how you imagine that use case being accommodated via this proposal, if adCreativeScanningMetadata is at the IG-level (and hence not seller-specific)?

rdgordon-index commented 1 month ago

Thinking about the seller-specific metadata concern above -- is there any reason why creativeScanningMetadata can't be derived from generateBid, rather that being associated with the IG, given that it's not subject to k-anon, and therfore doesn't need to be known ahead of auction time? After all, the call to the seller's K/V requires the renderURL from generateBid anyway. Thoughts?

eysegal commented 3 weeks ago

Thanks @orrb1 Will the creativeScanningMetadata have a specific structure? Is it subject to k-anonymity as well?