WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
542 stars 239 forks source link

Interest Group Filtering Ask: Add Reserved "openrtb" for Structured Matching, Allow Modified IG for Auction #673

Open thegreatfatzby opened 1 year ago

thegreatfatzby commented 1 year ago

Background One way we're thinking of leveraging IGs is to put all of the targeting segments into a single IG (as discussed elsewhere), the creative URLs that are targetable by the segments listed for that user, and to have that list of creatives and maybe some other userBiddingInfo updated periodically by the dailyUpdateURL.

One concern we have is network due to Interest Group blobs going over the wire, both into our DCs, between our Seller Ad Service and the various B&A TEEs, and even to 3PDSPs, and the compute that will accompany it.

Additionally relevant parts of the spec I think I'm understanding correctly:

Filtering Matches Ask The ask, roughly, would be to support filtering based on the OpenRTB spec, which would allow substantial flexibility and extensibility based on current industry practice. It would also be a great nudge towards interoperability, both between current tech and Coming Sandbox Tech, as well as between Ad Techs when Sandbox Tech is out. I say "based on OpenRTB" because it would be fine to disallow certain things, or at least make it clear they won't be available (precise lat/lon for instance).

Sellers would have something like: auctionConfig.auctionSignals.openrtb.request = {...} auctionConfig.auctionSignals.openrtb.versioning = {some something that can be used to indicate compatibility}

And IG owners would specify a priorityVector that includes OpenRTB request elements to filter on. priorityVector.auctionSignals.openrtb.request = {size, mediatype, coarse geo, etc} priorityVector.auctionSignals.openrtb.versioning = {openrtb >= 2.4 etc}

I'd think this would allow for substantial filtering of Interest Groups on the client side, even in the case of B & A based auctions. I can try to get numbers but I believe the "fast path" of our bidder does a ton of filtering quickly using contextual and coarse things like that. Were this implemented in C++ in Chromium on the client I'd think it would be fast and eliminate a lot of targeting.

Ideally this wouldn't need to be flat, it could be the full object as used today, to accommodate this then being used in the auction (obvs if it needs flattening that can be handled, I would think it would be easier for debugging, but not the most important part).

How to do the filtering would probably need get a little trickier than the current "dot product" based formula, as you might want to support something like exact matches on width and height, but maybe "value in (list)" for media type, or >= openrtb2.4 (bad example). However, I'd think that's not too bad, some thing must exist for that somewhere.

IG Modification Changes The second piece of this would be allow the filtering function to not eliminate the IG but modify it strictly for that auction request by removing elements of userBiddingSignals, creatives, etc. The mutation could be limited to removal so you'd always get a subset of the IG, so I don't think there'd be any information leak vectors, at least direct ones (I don't know about indirect, will have to think more).

Conclusion Again I'm happy to try to get numbers but I think giving a bit more "ad tech level" filtering as a 1st'ish Class Filtering Citizen would reduce IG usage in an auction a lot, be fast, and then combined with the IG Modification (and others discussed around multiple bids from an IG) allow more flexibility in the use of the PaAPI Auctions.

michaelkleber commented 1 year ago

You're absolutely right that a way to statically declare filtering rules could help by more efficiently letting some interest groups opt out of a particular auction.

If I understand correctly, though, all the things you're trying to achieve are already possible today, by writing your own filtering logic in generateBid(), right? The only benefit here is letting the browser execute some filtering rules instead of executing them yourself?

If that's right, then this is the kind of feature that we should probably wait and design later, once people actually start using the API and figuring out what types of filtering approaches are actually the most useful in practice. The priorityVector approach was our first attempt to build something that was flexible and offered this kind of functionality, so please do see how well it meets your goals. But let's hold off on designing another one without some real-world experience to draw from.

thegreatfatzby commented 1 year ago

Certainly true that you can decide not to bid in the generateBid function, however in the case of TEE based auctions, filtering in the browser prior to getting to generateBid running in the TEE would save on the size of the request in the links between browser -- Seller Ad Service -- Seller Front End -- Buyer Front End -- Bidding Service, which would be significant and plausible in the case of many interest groups passing filtering-as-possible-now, and it would also incur at least some compute on those systems as well, even if not to decrypt in all of them. Given high volume low latency requirements this could be quite significant.

michaelkleber commented 1 year ago

Ah yes, you are 100% correct that in the B&A context, declarative filtering has big benefits for communications costs, not just computational ones. And our priorityVector mechanism was not built with that design goal in mind.

Check out Bidding and Auction services payload optimization, over on the https://github.com/privacysandbox/fledge-docs GitHub repo where a lot of the B&A design work has happened. In particular @chatterjee-priyanka has put a lot of thought into minimizing the blob size, and how it interacts with the different stages of auction configuration.

thegreatfatzby commented 1 year ago

I will review thoroughly and get back, I guess just one question for now: can you give me your thoughts, even knee-jerk-I-reserve-the-right-to-retract-at-any-time thoughts on:

  1. The interaction between Fledge/PaAPI from...I guess any perspective, but in particular w/r/t choices from one that impact the other, I suppose mostly in the PaAPI --> BA direction, but I suppose both. How are you guys thinking about that? Are you trying to keep BA a separate layer above PaAPI and not have dependencies, or "dependencies", from one to the other?
  2. In particular w/r/t priorityVector, it not being designed for that...I guess I'd like to hear you elaborate on what is implied to you when you say that. I suppose this kind of reduces to (1), but applying filters client side prior to sending the request would seem to have significant operational and resource savings, and I'm not sure if you're saying the design constrains the ability to do that, or just that there might need to be some modification.

I'll hold off on more questions till I've had a chance to digest the link, thanks.

michaelkleber commented 1 year ago
  1. We wrote priorityVector when B&A didn't exist yet, so we didn't build a plan for how to filter on-device with less info vs on-server with more info. Note that the B&A flow envisions creating the blob of data for the B&A server before it runs the contextual auction — which avoids serialized round trips, yay, but means that there is much less information available on-device of the kind that might feed into your filtering stage.

Maybe that answer to (2) means the answer to (1) is now obvious, but in any case:

  1. It would be great if B&A were a completely drop-in replacement for on-device, but as you pointed out, the abstraction layer is leaky because of the network costs (both bytes and latency). So realistically there will be dependencies, and we are spending a bunch of time thinking about them right now. Your question about filtering fits very well into that.
thegreatfatzby commented 1 year ago

Couple more thoughts:

Client Side Filtering for On Device Auctions Another failure to communicate on my part is w/r/t the On Device Auctions, I think I had said something about cacheing but didn't really make a point. If IGs are only being updated at some cadence and if they could express some set of targeting in some uniform way, then the device would be able to maintain an index of IGs by those parameters, and with similar structure on the auctionConfig.auctionSignals allowing extraction of those signals to match, it would enable very fast filtering at runAuction time and would reduce the number of worklets that need to be spun up, invocations, etc. I think this would get significant in the case of auctions running for multiple slots on a page.

Additionally, pondering more, one advantage to having an enhanced "pre auction IG filtering" even in client side auctions would be to reduce the calls to the KV server, since I believe that takes place prior to generateBid, so even in the case of On Device auction it would result in network resource savings. So I guess this is a case where I agree that an "equivalent logical solution" exists, but the physics savings would be substantial on device or TEE.

Current best guess is we will use TEEs initially, but being able to run the auction(s) on the client seems compelling to me personally, I suspect would be to the business as well, and I'd love to see us being able to go that way and see what disruptions occur in the long run. I'm getting over my skis here a bit, but in theory it could help reduce transaction costs, especially if the entire auction was run client side but even if it contributed upfront logic, and I'd think in the long run that results in wins for consumers via more publisher content, etc etc.

BA vs On Device Design Yeah, this seems really fun. I would imagine you guys are talking a lot about how to layer the code and concepts**, and it's always tricky. One thing I've been thinking about w/r/t optimizing our approach to PaAPI and any other browser design, is that optimizations that cross layers of abstraction often yield the most benefit, so allowing "high level business/product" level choices to interact with a "mid level filtering" abstraction that eliminates IGs entirely I can see eliminating a lot. If there are ways we can contribute to that discussion let us know.

However, w/r/t Current Proposed Optimizations To the extent userBiddingSignals and creative_ids don't have to be sent over the wire to the ad system, and instead be looked up at "server auction time", that is (a) interesting w/r/t the initial framing but (b) a pretty huge savings, and limits the value of filtering IGs as a whole client side. So I'd like to continue this discussion, but yes that is quite helpful.

** Question One thing I was wondering before, didn't ask, but will now speaking of software layers: why put the PaAPI stuff in the main Chromium codebase at all? Why not keep that clean and make it a separate layer that is default installed or something.

JensenPaul commented 1 year ago

why put the PaAPI stuff in the main Chromium codebase at all? Why not keep that clean and make it a separate layer that is default installed or something.

The Chromium code base has a very modular/layered design. Much of the Protected Audience implementation lives in a separate service within the "content" layer, located in the Chromium codebase in the content/services/auction_worklet directory, so in some ways it is "a separate layer that is default installed". As for why it's a part of the Chromium codebase, it's partially because the Protected Audience implementation existed years before B&A, but if I had to do it again I'd probably choose the Chromium codebase to base it off of for a few reasons:

  1. Chromium has awesome support for launching and controlling processes, including support for highly resource constrained devices where we have highly tuned heurisitics to allocate processes to give the users the best balance of responsiveness and protection.
  2. Chromium has awesome support for sandboxing processes, that's been battle-hardened by more than a decade of attacks.
  3. Chromium has awesome IPC support (i.e. Mojo)
  4. Chromium has support for Windows, Android, Linux, Mac, ChromeOS etc
  5. Chromium is already tightly integrated with the V8 JavaScript engine