FLEDGE API Support for Interest Group and Ad Filtering

There is significant latency impact from the overhead for starting worklets and setting up v8 contexts for generateBid and scoreAd. On the other hand, a significant amount of DSP logic in generateBid and SSP logic in scoreAd is in enforcing various eligibility conditions: ensuring that ads meet publisher and policy requirements for the page, and that the publisher page meets ads' requirements. This means that frequently the API overhead is paid only to drop the interest group from the auction.

If the browser could provide an API for directly filtering out ads and interest groups without incurring the expensive overhead, we could realize significant latency improvements. A limited API that does not execute arbitrary JS code should not require the same sandboxing and separate contexts. Note that suggestions in #302 have some overlap since they would provide a limited way of filtering interest groups from the trusted server response.

A fairly powerful way to specify the kinds of eligibility conditions mentioned above is in terms of logical operations on sets of tokens. These could be expressed in a small DSL or directly in terms of a JSON tree representation, e.g. this might represent a publisher requirement not to have ads with 'shoes' or 'sports' tokens:

filteringTree = {
  'operator': 'ANDNOT',
  'tokens': ['shoes', 'sports'],
  'nodes': [...]
}

Each ad would come with some classification into sets of relevant tokens, for which the tree could be evaluated to determine its eligibility. Similarly each ad may have some tree to be applied to the classification of the page. In practice, we expect ad techs to use opaque tokens (possibly numbers) to avoid unnecessary leaking of sensitive data. We suggest having 'AND', 'OR', 'ANDNOT', and 'ORNOT' as possible operators. The nodes field could contain subtrees following the same schema. We expect typical usage to be on the scale of 10s of tokens per ad.

We propose that the API provides a way of setting a filtering condition for each ad on the trusted server response and/or in the interest group object that will be applied to perBuyerEligibilityTokens provided in the auctionConfig. The ads that are filtered out will be (temporarily) removed from the interestGroup input to generateBid at runAdAuction time. generateBid will not be run for any interest group that has no eligible ads.

Similarly, we suggest that SSPs have the same capability. The auctionConfig would have a sellerEligibilityCondition that would be applied to tokens provided by the trusted scoring server.

Buyer Filtering Example

Let's illustrate how we expect this to work with an example. Here is a possible filtering tree for one ad:

           OR
        /      \
      AND       [7]
     /   \
    OR    [4]
   /  \
ORNOT  [2, 3]
 |
[1]

which can be serialized to JSON as:

{ 'operator': 'OR',
  'tokens': [7],
  'nodes': [ {
    'operator': 'AND',
    'tokens': [4],
    'nodes': [ {
      'operator': 'OR',
      'tokens': [2, 3],
      'nodes': [ {
        'operator': 'ORNOT',
        'tokens': [1]
      }]
    }]
  }]
}

This tree would be returned from the trusted server response. A possible API: the filtering tree is returned from the trusted server with a special field that gives a map from renderUrl (as a way to identify the ad) to filtering tree:

{
  // …
  'buyerEligibilityConditions': {
    'https://cdn.com/render_url_of_bidder1': {...},
  'https://cdn.com/render_url_of_bidder2': {...},
  // …
  }
}

If the responses for different trusted bidding keys contain conflicting conditions for the same renderUrl, then the browser is free to select any one.

Then in the call to runAdAuction, the buyer can provide tokens describing the page to be passed in the auctionConfig. For example, suppose the publisher page is in the US, discusses politics, and has to do with cars; a DSP might encode these observations via tokens 2, 7, and 9:

auctionConfig = {
  // …
  'perBuyerEligibilityTokens': {
    'https://www.example-dsp.com': [2, 7, 9],
    // …
  }
}

In the above example, the ad would be eligible since token 7 is present, even though the left part of the tree does not evaluate to true since 4 is not present.

Seller Filtering Example

Seller filtering is similar, but the conditions are provided in the auctionConfig and classification of renderUrls into tokens are provided by the trusted seller server. Publisher requirements are passed in a tree in auctionConfig:

auctionConfig = {
  // …
  'sellerEligibilityConditions': {
    'operator': 'ORNOT'
    'tokens': [3, 5, 14, 19]
  }
}

Here, we have a simpler tree, since we gave a more complicated example above.

Then, the trustedScoringSignals could include a specific field with SSP tokens describing the renderUrl:

{
  'renderUrl': {
    'https://cdn.com/render_url_of_bidder': {
       // …
       'eligibilityTokens': [14, 15, 20]
     }
  }
}

In this case the ad would be filtered out since token 14 is present.

@bmilekic you might see if Jérémie is interested to weigh in here as there is a proposal that might involve designing a language (instead of JSON proposed above) for bidding logic. I couldn't help but think of his work on Bonsai (Custom decision trees for real time bidding) at AppNexus (ref).

Getting the buyer's logic from the trusted bidder signals fetch as opposed to having it as part of the interest group raises some issues:

1) If we want to be able to skip creation of a process for the buyer's origin, by not creating it on rejection, we have to download the bidder signals (which can be quite large) into the main browser process. We really don't want to OOM the browser process, so downloading a response of indefinite size there and caching it is not great. Doing it for a dozen buyers at once is even more problematic.

2) What do we do about fetching the JS? Currently, we fetch the JS and buy signals in parallel. If we instead wait until after we've applied the filter, and created the process to run the buyer's worklets, auctions would run strictly slower, in the case no buyers are filtered out entirely. Even in the case some buyers are, auctions could still be slower, if not enough of them are.

If the decision logic was part of the IG itself, that would fix those issues, though it would make filtering less powerful (e.g., couldn't filter out based on running out of budget, unless that were learned in the context of the page, and passed in as a parameter to the filter from there, as opposed to modifying the filter rules based on remaining budget).

Edit: And learning remaining budget in the context of the page probably doesn't work (potentially leaks too much data, and too many potential budgets to learn to make sharing them there reasonable, anyways)

Here's an alternative proposal. Feedback would be much appreciated. Apologies for any formatting issues, this is copied from a Google doc.

Alternative proposal

TLDR: Do basically the same thing, but:

Use arithmetic operations and use the result to set priorities instead of boolean operations.
- This does make expressing boolean operations a bit awkward (particularly since "0" means don't filter - that's due to the current default priority being 0).
Uses lists instead of dictionaries. This allows arguments to have orders, so can do subtraction and division. It also potentially shrinks the trees, since no labels are needed.
Interest groups specify their list as part of the InterestGroup itself, or as part of trusting bidding signals fetches, though only the former sets priority (for now), or both.
Sellers still use per-render-URL filters fetched with seller signals, though also use numeric output.

Details

New data types

There are two new data types: priority trees and priority tree inputs. A priority tree is applied to a priority tree input, which results in a number or an error. In the case of an error, the priority tree has no effect (could throw out the bid instead?). Priority tree inputs The input to a priority tree is a dictionary mapping arbitrary strings to numeric values:

{ key1: value1, key2: value2, … }

Priority trees

The format of a priority tree is a JSON value, with the possibilities being:

A number meaning that value.
A string means look up the value for that key in the priorityTreeInputs structure. Missing keys are treated as having values of 0.
A list of the form [operation, arg1, arg2, …]

In the case of a list, the operation is one of a fixed set of strings, and is used to determine how many other arguments are allowed, and how they are interpreted. Additional or insufficient arguments are considered errors, and will result in ignoring the output of evaluating the tree:

add indicates that the arguments are separate priority tree, which are all evaluated independently and then added together. There must be at least two arguments, and there is no limit to the number of arguments.
mul is like add, but with multiplication.
sub is like add, but subtracts all subsequent args from the first argument.
div is like sub, but divides the first argument by all subsequent arguments.
neg takes a single argument, and negates it.
priority is the priority value of the interest group (which can be modified by previous generateBid() calls), and is only available to bidder filters. priority takes no arguments. Note that this is specified as ["priority"], where the operation itself is "priority", as opposed to "priority" being an argument to an operation.
age is how long the user has been in the interest group, in milliseconds.

Buyer priority trees and input

For buyers, the priority tree input is specified as a new AuctionConfig field. They are specified on a per-buyer basis:

perBuyerPriorityTreeInputs = { “buyer1”: {...} “buyer2”: {...} “*”: {...} }

The special “” value applies to all buyers. If there are both “Buyer” and “” keys present, and Buyer’s priority tree references a key not found in Buyer’s specific priority tree input, then “*” will be checked for a matching key.

Interest group priority trees are provided as part of an interest group itself, using the new “priorityTree” field, so they can be evaluated before requesting any resources over a network. If the result of evaluating an interest group’s priority tree is negative, the interest group is not given a chance to bid in an auction. Otherwise, the result of evaluating the priority tree is used as the interest group’s priority when running the auction.

Priority trees may also be fetched as part of fetching trusted bidding signals, but in that case, they are only used to skip bidding if the result is negative, instead of setting the priority (TODO: We should make this adjust the priority as well, though that does require some major refactoring). Having them in both locations lets consumers pick between the two options, which results in a performance/flexibility tradeoff (e.g. fetching trees as part of the trusted bidding signals means any filtering decisions must be delayed until the fetch completes, but allows more flexibility to update the trees). It also allows each tree to be used for different things (e.g., fetched the JSON “tree” could just be -1 if an ad campaign has run out of budget, while the tree in the IG could be based on more static preferences based on content of the publisher page).

In order for JSON fetches to provide this data, bidding fetches need to be updated. An additional parameter is added to the JSON fetches “&interestGroups=groupName1,groupName2,...” for all the interest groups the fetch is for, and the response is now of the format:

{ keys: ..., priorityTrees: { groupName1: priorityTree, groupName2: priorityTree, … } }

In addition, the server must send a “fledge-bidding-signals-format-version: 2” header, for the response to be interpreted as using the new format, though eventually support for the old format will be removed.

Seller priority trees and input

Sellers may specify their priority trees as part of their auctionConfig, via the new field:

sellerPriorityTree = priorityTree

They specify their priority tree input via new fields in their trusted selling signals responses:

{ renderUrlPriorityTreeInputs: { renderUrl1: priorityTreeInput, renderUrl2: priorityTreeInput, … }, componentRenderUrlPriorityTreeInputs: { componentRenderUrl1: priorityTreeInput, componentRenderUrl2: priorityTreeInput, … } }

There is no change to the requested URL, nor introduction of a versioning scheme, since the seller signals JSON format already uses a top-level dictionary that can be expanded by adding new keys.

For a given bid, the AuctionConfig’s priority tree is run against all matching URLs (the renderUrl and componentRenderUrls), and if any of them is negative, the bid is rejected. If all are either positive, or have no URL-specific priority tree input (or there’s no priority tree specified in the auction config), the bid is passed to the seller’s scoreAd() method. The magnitude of the output of evaluating the priority tree has no impact.

@jonasz: My last post is a proposal to both provide a filtering API, and allow IGs to adjust priority based on a sparse dot product, as I believe you said you were interested in. The API is a little clunky, to accommodate both needs. Feedback would be welcome. Don't want to start implementation until we know if it meets folks needs, and give people a chance to provide alternatives.

Hi Matt,

Thanks for the proposal, this definitely sounds useful. I think the ability to adjust the priority in this way will open up a great area for optimization.

Some thoughts / comments:

Being able to adjust the priority based on the response from the trusted server is useful. (Even if it only happens longer term.)
- Use case: adjust the priority according to the campaign's config. (If the advertiser is willing to pay more for a click, the priority would rise; if the budget is reached, the priority would drop to zero; etc.)
I think more types of operations could be useful, some that come to my mind: max, if, exp.
As to tree inputs, I think in addition to perBuyerPriorityTreeInputs it'd be useful to have IG.treeInputs.
- It'd be great bo be able to overwrite both IG.priorityTree and IG.treeInputs from within generateBid.
- // In this setup the IG.priority field becomes redundant.
What may need clarification - it seems in your proposal there'd be a tree per IG, and @stguav originally proposed that filtering happen per ad, not per IG.
- FWIW, from our perspective, IG-granularity for filtering is sufficient.
The dynamic age parameter in the tree is a great idea!
- Especially in tandem with points 2. and 3. - this'd allow us to calculate, within generateBid, the priority as a function of time. ("If age < X then priority is p1, else if age < Y then ...".)

Best regards, Jonasz

Hi Jonasz,

Thanks so much for the suggestions! Here are some responses to your ideas:

Being able to adjust the priority based on the response from the trusted server is useful. (Even if it only happens longer term.)

Good to hear you're interested in this! Implementing this will likely be a major investment (and not have quite the same performance gains when it's done), so nice to get some feedback on the idea before we start implementing.

* Use case: adjust the priority according to the campaign's config. (If the advertiser is willing to pay more for a click, the priority would rise; if the budget is reached, the priority would drop to zero; etc.)>

Is this sort of slow change already supported by the existing interest group priority and setPriority() features?

I think more types of operations could be useful, some that come to my mind: max, if, exp.

I wasn't sure how much interest there would be in boolean and conditional support, in particular, but if folks are interested, it should not be difficult to add these operations.

As to tree inputs, I think in addition to perBuyerPriorityTreeInputs it'd be useful to have IG.treeInputs.

It'd be great to be able to overwrite both IG.priorityTree and IG.treeInputs from within generateBid. // In this setup the IG.priority field becomes redundant.

This seems like it could be really useful. This would let the filter have access to information potentially from multiple sites (the site calling joinAdInterestGroup(), and the site(s) modifying the tree or inputs). As long as that extra information isn’t accessible to Javascript, only determines whether or not generateBid() is called, and multiple Javascript calls do not share a Javascript context, this is probably OK, though we do need to be very careful when allowing interest groups to combine data from multiple sites.

What may need clarification - it seems in your proposal there'd be a tree per IG, and @stguav originally proposed that filtering happen per ad, not per IG. * FWIW, from our perspective, IG-granularity for filtering is sufficient.

With my proposal, each IG would have a filter/priority that comes from the bidder/IG before calling generateBid(). If there is any bidder filtering to apply to ads, the assumption is that the bidder would do that in generateBid() itself.

Then the seller gets to apply filters to the render URL, but only after the bidder generates the bid. We can't send the ad URLs to the seller before then because it's likely way too much data to get a full set of scoring signals for, and we can't send an IG's ads to the seller without the IG telling us it's OK to do so. IG's currently have no way to express they're OK with a particular seller except by generating a bid. Giving IGs a way to declare what sellers they're OK with getting this information up front would solve the latter problem (and is something we’re thinking about), but it would not solve the potential data size problem if we sent all ads from all IGs to the seller and got all signals immediately upon auction start.

It’s possible that, if we have IG opt-in to sharing data, we could potentially send the names of all IGs participating in the auction to the seller up front, but that would require either entirely reworking the JSON format, or two JSON requests to the seller.

The dynamic age parameter in the tree is a great idea!

Paul Jensen deserves full credit for this addition

Use case: adjust the priority according to the campaign's config. (If the advertiser is willing to pay more for a click, the priority would rise; if the budget is reached, the priority would drop to zero; etc.)>

Is this sort of slow change already supported by the existing interest group priority and setPriority() features?

Not really - note that the campaign config may actually change quite rapidly, and these changes could sometimes increase and sometimes decrease the priority. Once we decrease the priority via setPriority, though, we may not get a chance to increase it back - as the priority may be too low now, and we will never see generateBid called for this IG again.

As to tree inputs, I think in addition to perBuyerPriorityTreeInputs it'd be useful to have IG.treeInputs.

It'd be great to be able to overwrite both IG.priorityTree and IG.treeInputs from within generateBid. // In this setup the IG.priority field becomes redundant.

This seems like it could be really useful. This would let the filter have access to information potentially from multiple sites (the site calling joinAdInterestGroup(), and the site(s) modifying the tree or inputs). As long as that extra information isn’t accessible to Javascript, only determines whether or not generateBid() is called, and multiple Javascript calls do not share a Javascript context, this is probably OK, though we do need to be very careful when allowing interest groups to combine data from multiple sites.

I see. Just to confirm - in my view it should be totally fine if the priority-related fields are write-only from the perspective of generateBid.

I see. Just to confirm - in my view it should be totally fine if the priority-related fields are write-only from the perspective of generateBid.

Yes, or at least it currently seems to us that adding more write-only fields which are only accessible to filters / reprioritization logic should be fine. The resulting priority also shouldn't be exposed to generateBid(), which is already the case for setPriority(), though that's not spelled out in the explainer, currently, which is something we need to fix.

We’ve been having second thoughts about adding a new language to the web platform, and are now wondering if a sparse vector multiply more along the lines of RTB House’s proposal might be good enough for most consumers to work with. It would satisfy issue #302 also. If this is insufficient, we’re thinking that the best script-based option is to use Javascript scripts instead, which we can run in a single frozen global context along the lines of issue #310, so will hopefully be fairly fast to execute.

This proposal is going to focus on an added buy-side API, which we could potentially extend to sell-side, if that turns out to be useful. Any filtering out of bids on the buy-side will also reduce the number of scoreAd() invocations, so a buy-side API alone can reduce both buy-side and sell-side latency and compute resource usage. We’re hoping we can get Javascript per-function-call overhead down enough that if a seller needs filters, it’s performant for them to be embedded in the seller script and the JSON data it takes as input. Since buyers may take some time to generate each bid, and don’t know all the interest groups a user is in, buy-side filtering/reprioritization makes sense, even with greatly reduced overhead for each generateBid() call.

Details

The new filter works by taking the dot product of two sparse vectors, represented as JSON dictionaries (e.g. { “cars”:1, “politics”:0, “42”:-10 }), together. If the result is less than or equal to 0, the interest group is dropped from the auction. If it’s greater than 0, it replaces the interest group’s priority.

Filtering / reprioritization can be done either at auction start or when receiving JSON from the real-time trusted bidding signals server, or both. If it’s only done at auction start for all interest groups owned by a particular buyer, then perBuyerGroupLimits will be applied at the start of the auction, resulting in fetching less JSON data. If it’s done on receiving JSON as well/instead, then perBuyerGroupLimits will only be enforced once all JSON for a buyer is fetched, and the final priority is known for all interest groups a bidder owns.

The auctionConfig has a new field:

perBuyerPrioritySignals : { “https://buyer1.com” : {...} “https://buyer2.com” : {...} “*”: {...} }

Where each entry is a per-buyer dictionary of keys to JSON numbers used in the sparse vector multiplication. The “*” field is used for all buyers, with identically keyed buyer-specific fields taking precedence. Keys starting with “browserSignals.” are reserved for values provided by the browser, so may not be set in an auctionConfig.

There are also new optional fields in interest group definitions. They are:

useBiddingSignalsPrioritization : [true | false], priorityVector : {...}, prioritySignalsOverrides : {...}

If useBiddingSignalsPrioritization is true, then the trusted bidder signals received from the server may, but is not required to, also include a priorityVector for each interest group, that will also be multiplied by the perBuyerPrioritySignals to obtain the final priority, which takes precedence over a priority calculated by the priorityVector specified in the interest group, if the priorityVector multiplication didn’t result in a value <= 0 (in which case the interest group was already filtered out of the auction).

If a priorityVector is provided, then it is multiplied by the perBuyerPrioritySignals for the auction by a sparse vector multiplication to calculate the new priority at the start of the auction, and if the value is less than 0, the interest group does not participate in the auction.

Values in prioritySignalsOverrides take precedence over values in perBuyerPrioritySignals in all vector multiplications. In addition, values in perBuyerPrioritySignals for all future auctions for a particular interest group can be overridden in generateBid() by calling setPerBuyerPrioritySignals(key, value). Neither the original nor overridden values of perBuyerPrioritySignals will be provided in the interestGroup object passed to generateBid(). Values set in perBuyerPrioritySignals can override the otherwise reserved values starting with “browserSignals.”.

In order for JSON fetches to provide priorityVectors, the format of trusted bidding signals fetches needs to be updated. If useBiddingSignalsPrioritization is set, an additional parameter is added to the JSON fetches “&interestGroups=groupName1,groupName2,...” for all the interest groups the fetch is for, and the response is now of the format:

{ keys : <key-value dictionary used for trustedBiddingSignals>, perGroupData : { groupName1 : { priorityVector: {...} }, groupName1 : { priorityVector : {...} }, … } }

In addition, the server must send a “X-fledge-bidding-signals-format-version: 2” header, for the response to be interpreted as using the new format, though eventually support for the old format will be removed. The new format is supported even when useBiddingSignalsPrioritization is not set, and the interest groups are not passed in the query param.

If a priorityVector is not present for a group, the original priority is used (the priority from the interest group’s priorityVector multiplication, if present, or the priority from the interest group itself, if not)

For all sparse multiplications, the browser appends a number of values to the perBuyerPrioritySignals. These are:

browserSignals.one: Always one. Useful for adding a fixed value to the calculation.
browserSignals.priority: Result of earlier priority calculation. This may be the interest group’s priority, the priority set by the last generateBid() invocation of the interest group calling setPriority(), or the output of the interest group’s priorityVector and the perBuyerPrioritySignals in the auction config.
browserSignals.age: How long ago the user was added to the interest group, in milliseconds.
Suggestions welcome - variants on age might be useful? (log age, 1/age, etc)

When fetching interest group updates, the interest group’s new priorityVector, if present, replaces the old one. However, the interest group’s new perBuyerPrioritySignalsOverrides is merged with the old one (including updated values from bidding scripts), with the values in the fetched update taking precedence over the old values. Re-joining an interest group will replace all fields unconditionally, including perBuyerPrioritySignalsOverrides.

If this is insufficient, we’re thinking that the best script-based option is to use Javascript scripts instead, which we can run in a single frozen global context along the lines of issue https://github.com/WICG/turtledove/issues/310, so will hopefully be fairly fast to execute.

Of course a JS function would be more flexible than the sparse multiplication, and I was wondering, what would be the drawbacks of the JS-based approach? Would it be much more complex implementation-wise?

There are a couple concerns:

1) We can't run filters until we start a separate process to run the JS in, even if we don't need to download anything (it's slower). This particularly affects performance in cases where all IGs of a buyer would be filtered out (which seems most likely in multi-DSP auctions) 2) We need to create a JS context in that process before we can run those filters (it's slower). While we currently always create an extra JS context for decoding JSON (though only after doing priority-based filtering), we may be able to get rid of that down the line, so this also removes a potential avenue for performance improvement. 3) We need to call into that JS context (Which...is also slower).

So the concerns are basically around performance, rather than implementation (adding two sets of cross-process scoring calls - one before downloading JSON, and one after, is also more complicated to implement than just the after-JSON ones, but that's not the real concern here). V8 is not really designed or optimized for scripts that are loaded, run once, and then immediately discarded - I assume this is the case for Javascript engines in general, though that's not an area I have any expertise in.

Thanks, this makes sense.

The sparse dot product sounds like a promising direction, we would use it if it was supported. (I think it is likely that during the development we would also come across some iterative impovement ideas, like additional operations, special variables, etc.)

Adding a new language for filtering will increase even more the technical complexity of Fledge.

The easiest would be just allow custom implementations in some way and provide a way to get access to all information to buyers and sellers and then each implementer could choose its own implementation.

Maybe it could be solved now server side with the proposal to allow user defined function in the trusted server ?

Doing this server side would have many advantages because one could scale the servers depending on the computations that are done. Potentially it requires to be able to inject more signals in the trusted server as discussed in the last meeting from 31/08/2022.

Today's Intent to Ship references this issue:

We’re addressing some remaining TODOs and specifying some recently added non-breaking features (e.g. #304, #305, #310, #166).

What are you specifying?

WICG / turtledove