Open stguav opened 2 years ago
@bmilekic you might see if Jérémie is interested to weigh in here as there is a proposal that might involve designing a language (instead of JSON proposed above) for bidding logic. I couldn't help but think of his work on Bonsai (Custom decision trees for real time bidding) at AppNexus (ref).
Getting the buyer's logic from the trusted bidder signals fetch as opposed to having it as part of the interest group raises some issues:
1) If we want to be able to skip creation of a process for the buyer's origin, by not creating it on rejection, we have to download the bidder signals (which can be quite large) into the main browser process. We really don't want to OOM the browser process, so downloading a response of indefinite size there and caching it is not great. Doing it for a dozen buyers at once is even more problematic.
2) What do we do about fetching the JS? Currently, we fetch the JS and buy signals in parallel. If we instead wait until after we've applied the filter, and created the process to run the buyer's worklets, auctions would run strictly slower, in the case no buyers are filtered out entirely. Even in the case some buyers are, auctions could still be slower, if not enough of them are.
If the decision logic was part of the IG itself, that would fix those issues, though it would make filtering less powerful (e.g., couldn't filter out based on running out of budget, unless that were learned in the context of the page, and passed in as a parameter to the filter from there, as opposed to modifying the filter rules based on remaining budget).
Edit: And learning remaining budget in the context of the page probably doesn't work (potentially leaks too much data, and too many potential budgets to learn to make sharing them there reasonable, anyways)
Here's an alternative proposal. Feedback would be much appreciated. Apologies for any formatting issues, this is copied from a Google doc.
Alternative proposal
TLDR: Do basically the same thing, but:
Details
New data types
There are two new data types: priority trees and priority tree inputs. A priority tree is applied to a priority tree input, which results in a number or an error. In the case of an error, the priority tree has no effect (could throw out the bid instead?). Priority tree inputs The input to a priority tree is a dictionary mapping arbitrary strings to numeric values:
{ key1: value1, key2: value2, … }
Priority trees
The format of a priority tree is a JSON value, with the possibilities being:
In the case of a list, the operation is one of a fixed set of strings, and is used to determine how many other arguments are allowed, and how they are interpreted. Additional or insufficient arguments are considered errors, and will result in ignoring the output of evaluating the tree:
Buyer priority trees and input
For buyers, the priority tree input is specified as a new AuctionConfig field. They are specified on a per-buyer basis:
perBuyerPriorityTreeInputs = { “buyer1”: {...} “buyer2”: {...} “*”: {...} }
The special “” value applies to all buyers. If there are both “Buyer” and “” keys present, and Buyer’s priority tree references a key not found in Buyer’s specific priority tree input, then “*” will be checked for a matching key.
Interest group priority trees are provided as part of an interest group itself, using the new “priorityTree” field, so they can be evaluated before requesting any resources over a network. If the result of evaluating an interest group’s priority tree is negative, the interest group is not given a chance to bid in an auction. Otherwise, the result of evaluating the priority tree is used as the interest group’s priority when running the auction.
Priority trees may also be fetched as part of fetching trusted bidding signals, but in that case, they are only used to skip bidding if the result is negative, instead of setting the priority (TODO: We should make this adjust the priority as well, though that does require some major refactoring). Having them in both locations lets consumers pick between the two options, which results in a performance/flexibility tradeoff (e.g. fetching trees as part of the trusted bidding signals means any filtering decisions must be delayed until the fetch completes, but allows more flexibility to update the trees). It also allows each tree to be used for different things (e.g., fetched the JSON “tree” could just be -1 if an ad campaign has run out of budget, while the tree in the IG could be based on more static preferences based on content of the publisher page).
In order for JSON fetches to provide this data, bidding fetches need to be updated. An additional parameter is added to the JSON fetches “&interestGroups=groupName1,groupName2,...” for all the interest groups the fetch is for, and the response is now of the format:
{ keys: ..., priorityTrees: { groupName1: priorityTree, groupName2: priorityTree, … } }
In addition, the server must send a “fledge-bidding-signals-format-version: 2” header, for the response to be interpreted as using the new format, though eventually support for the old format will be removed.
Seller priority trees and input
Sellers may specify their priority trees as part of their auctionConfig, via the new field:
sellerPriorityTree = priorityTree
They specify their priority tree input via new fields in their trusted selling signals responses:
{ renderUrlPriorityTreeInputs: { renderUrl1: priorityTreeInput, renderUrl2: priorityTreeInput, … }, componentRenderUrlPriorityTreeInputs: { componentRenderUrl1: priorityTreeInput, componentRenderUrl2: priorityTreeInput, … } }
There is no change to the requested URL, nor introduction of a versioning scheme, since the seller signals JSON format already uses a top-level dictionary that can be expanded by adding new keys.
For a given bid, the AuctionConfig’s priority tree is run against all matching URLs (the renderUrl and componentRenderUrls), and if any of them is negative, the bid is rejected. If all are either positive, or have no URL-specific priority tree input (or there’s no priority tree specified in the auction config), the bid is passed to the seller’s scoreAd() method. The magnitude of the output of evaluating the priority tree has no impact.
@jonasz: My last post is a proposal to both provide a filtering API, and allow IGs to adjust priority based on a sparse dot product, as I believe you said you were interested in. The API is a little clunky, to accommodate both needs. Feedback would be welcome. Don't want to start implementation until we know if it meets folks needs, and give people a chance to provide alternatives.
Hi Matt,
Thanks for the proposal, this definitely sounds useful. I think the ability to adjust the priority in this way will open up a great area for optimization.
Some thoughts / comments:
max
, if
, exp
.perBuyerPriorityTreeInputs
it'd be useful to have IG.treeInputs
.
IG.priorityTree
and IG.treeInputs
from within generateBid.IG.priority
field becomes redundant.age
parameter in the tree is a great idea!
generateBid
, the priority as a function of time. ("If age < X then priority is p1, else if age < Y then ...".)Best regards, Jonasz
Hi Jonasz,
Thanks so much for the suggestions! Here are some responses to your ideas:
- Being able to adjust the priority based on the response from the trusted server is useful. (Even if it only happens longer term.)
Good to hear you're interested in this! Implementing this will likely be a major investment (and not have quite the same performance gains when it's done), so nice to get some feedback on the idea before we start implementing.
* Use case: adjust the priority according to the campaign's config. (If the advertiser is willing to pay more for a click, the priority would rise; if the budget is reached, the priority would drop to zero; etc.)>
Is this sort of slow change already supported by the existing interest group priority and setPriority() features?
max
, if
, exp
.I wasn't sure how much interest there would be in boolean and conditional support, in particular, but if folks are interested, it should not be difficult to add these operations.
- As to tree inputs, I think in addition to
perBuyerPriorityTreeInputs
it'd be useful to haveIG.treeInputs
.It'd be great to be able to overwrite both
IG.priorityTree
andIG.treeInputs
from within generateBid. // In this setup theIG.priority
field becomes redundant.
This seems like it could be really useful. This would let the filter have access to information potentially from multiple sites (the site calling joinAdInterestGroup(), and the site(s) modifying the tree or inputs). As long as that extra information isn’t accessible to Javascript, only determines whether or not generateBid() is called, and multiple Javascript calls do not share a Javascript context, this is probably OK, though we do need to be very careful when allowing interest groups to combine data from multiple sites.
- What may need clarification - it seems in your proposal there'd be a tree per IG, and @stguav originally proposed that filtering happen per ad, not per IG. * FWIW, from our perspective, IG-granularity for filtering is sufficient.
With my proposal, each IG would have a filter/priority that comes from the bidder/IG before calling generateBid(). If there is any bidder filtering to apply to ads, the assumption is that the bidder would do that in generateBid() itself.
Then the seller gets to apply filters to the render URL, but only after the bidder generates the bid. We can't send the ad URLs to the seller before then because it's likely way too much data to get a full set of scoring signals for, and we can't send an IG's ads to the seller without the IG telling us it's OK to do so. IG's currently have no way to express they're OK with a particular seller except by generating a bid. Giving IGs a way to declare what sellers they're OK with getting this information up front would solve the latter problem (and is something we’re thinking about), but it would not solve the potential data size problem if we sent all ads from all IGs to the seller and got all signals immediately upon auction start.
It’s possible that, if we have IG opt-in to sharing data, we could potentially send the names of all IGs participating in the auction to the seller up front, but that would require either entirely reworking the JSON format, or two JSON requests to the seller.
- The dynamic
age
parameter in the tree is a great idea!
Paul Jensen deserves full credit for this addition
- Use case: adjust the priority according to the campaign's config. (If the advertiser is willing to pay more for a click, the priority would rise; if the budget is reached, the priority would drop to zero; etc.)>
Is this sort of slow change already supported by the existing interest group priority and setPriority() features?
Not really - note that the campaign config may actually change quite rapidly, and these changes could sometimes increase and sometimes decrease the priority. Once we decrease the priority via setPriority
, though, we may not get a chance to increase it back - as the priority may be too low now, and we will never see generateBid
called for this IG again.
- As to tree inputs, I think in addition to
perBuyerPriorityTreeInputs
it'd be useful to haveIG.treeInputs
.It'd be great to be able to overwrite both
IG.priorityTree
andIG.treeInputs
from within generateBid. // In this setup theIG.priority
field becomes redundant.This seems like it could be really useful. This would let the filter have access to information potentially from multiple sites (the site calling joinAdInterestGroup(), and the site(s) modifying the tree or inputs). As long as that extra information isn’t accessible to Javascript, only determines whether or not generateBid() is called, and multiple Javascript calls do not share a Javascript context, this is probably OK, though we do need to be very careful when allowing interest groups to combine data from multiple sites.
I see. Just to confirm - in my view it should be totally fine if the priority-related fields are write-only from the perspective of generateBid
.
I see. Just to confirm - in my view it should be totally fine if the priority-related fields are write-only from the perspective of
generateBid
.
Yes, or at least it currently seems to us that adding more write-only fields which are only accessible to filters / reprioritization logic should be fine. The resulting priority also shouldn't be exposed to generateBid(), which is already the case for setPriority(), though that's not spelled out in the explainer, currently, which is something we need to fix.
We’ve been having second thoughts about adding a new language to the web platform, and are now wondering if a sparse vector multiply more along the lines of RTB House’s proposal might be good enough for most consumers to work with. It would satisfy issue #302 also. If this is insufficient, we’re thinking that the best script-based option is to use Javascript scripts instead, which we can run in a single frozen global context along the lines of issue #310, so will hopefully be fairly fast to execute.
This proposal is going to focus on an added buy-side API, which we could potentially extend to sell-side, if that turns out to be useful. Any filtering out of bids on the buy-side will also reduce the number of scoreAd() invocations, so a buy-side API alone can reduce both buy-side and sell-side latency and compute resource usage. We’re hoping we can get Javascript per-function-call overhead down enough that if a seller needs filters, it’s performant for them to be embedded in the seller script and the JSON data it takes as input. Since buyers may take some time to generate each bid, and don’t know all the interest groups a user is in, buy-side filtering/reprioritization makes sense, even with greatly reduced overhead for each generateBid() call.
The new filter works by taking the dot product of two sparse vectors, represented as JSON dictionaries (e.g. { “cars”:1, “politics”:0, “42”:-10 }), together. If the result is less than or equal to 0, the interest group is dropped from the auction. If it’s greater than 0, it replaces the interest group’s priority.
Filtering / reprioritization can be done either at auction start or when receiving JSON from the real-time trusted bidding signals server, or both. If it’s only done at auction start for all interest groups owned by a particular buyer, then perBuyerGroupLimits
will be applied at the start of the auction, resulting in fetching less JSON data. If it’s done on receiving JSON as well/instead, then perBuyerGroupLimits
will only be enforced once all JSON for a buyer is fetched, and the final priority is known for all interest groups a bidder owns.
The auctionConfig
has a new field:
perBuyerPrioritySignals : { “https://buyer1.com” : {...} “https://buyer2.com” : {...} “*”: {...} }
Where each entry is a per-buyer dictionary of keys to JSON numbers used in the sparse vector multiplication. The “*” field is used for all buyers, with identically keyed buyer-specific fields taking precedence. Keys starting with “browserSignals.
” are reserved for values provided by the browser, so may not be set in an auctionConfig.
There are also new optional fields in interest group definitions. They are:
useBiddingSignalsPrioritization : [true | false], priorityVector : {...}, prioritySignalsOverrides : {...}
If useBiddingSignalsPrioritization
is true, then the trusted bidder signals received from the server may, but is not required to, also include a priorityVector
for each interest group, that will also be multiplied by the perBuyerPrioritySignals
to obtain the final priority, which takes precedence over a priority calculated by the priorityVector
specified in the interest group, if the priorityVector
multiplication didn’t result in a value <= 0 (in which case the interest group was already filtered out of the auction).
If a priorityVector
is provided, then it is multiplied by the perBuyerPrioritySignals
for the auction by a sparse vector multiplication to calculate the new priority at the start of the auction, and if the value is less than 0, the interest group does not participate in the auction.
Values in prioritySignalsOverrides
take precedence over values in perBuyerPrioritySignals
in all vector multiplications. In addition, values in perBuyerPrioritySignals
for all future auctions for a particular interest group can be overridden in generateBid()
by calling setPerBuyerPrioritySignals(key, value)
. Neither the original nor overridden values of perBuyerPrioritySignals
will be provided in the interestGroup
object passed to generateBid()
. Values set in perBuyerPrioritySignals
can override the otherwise reserved values starting with “browserSignals.
”.
In order for JSON fetches to provide priorityVectors
, the format of trusted bidding signals fetches needs to be updated. If useBiddingSignalsPrioritization
is set, an additional parameter is added to the JSON fetches “&interestGroups=groupName1,groupName2,...” for all the interest groups the fetch is for, and the response is now of the format:
{ keys : <key-value dictionary used for trustedBiddingSignals>, perGroupData : { groupName1 : { priorityVector: {...} }, groupName1 : { priorityVector : {...} }, … } }
In addition, the server must send a “X-fledge-bidding-signals-format-version: 2” header, for the response to be interpreted as using the new format, though eventually support for the old format will be removed. The new format is supported even when useBiddingSignalsPrioritization
is not set, and the interest groups are not passed in the query param.
If a priorityVector
is not present for a group, the original priority is used (the priority from the interest group’s priorityVector
multiplication, if present, or the priority from the interest group itself, if not)
For all sparse multiplications, the browser appends a number of values to the perBuyerPrioritySignals
. These are:
generateBid()
invocation of the interest group calling setPriority()
, or the output of the interest group’s priorityVector
and the perBuyerPrioritySignals
in the auction config.When fetching interest group updates, the interest group’s new priorityVector
, if present, replaces the old one. However, the interest group’s new perBuyerPrioritySignalsOverrides
is merged with the old one (including updated values from bidding scripts), with the values in the fetched update taking precedence over the old values. Re-joining an interest group will replace all fields unconditionally, including perBuyerPrioritySignalsOverrides
.
If this is insufficient, we’re thinking that the best script-based option is to use Javascript scripts instead, which we can run in a single frozen global context along the lines of issue https://github.com/WICG/turtledove/issues/310, so will hopefully be fairly fast to execute.
Of course a JS function would be more flexible than the sparse multiplication, and I was wondering, what would be the drawbacks of the JS-based approach? Would it be much more complex implementation-wise?
There are a couple concerns:
1) We can't run filters until we start a separate process to run the JS in, even if we don't need to download anything (it's slower). This particularly affects performance in cases where all IGs of a buyer would be filtered out (which seems most likely in multi-DSP auctions) 2) We need to create a JS context in that process before we can run those filters (it's slower). While we currently always create an extra JS context for decoding JSON (though only after doing priority-based filtering), we may be able to get rid of that down the line, so this also removes a potential avenue for performance improvement. 3) We need to call into that JS context (Which...is also slower).
So the concerns are basically around performance, rather than implementation (adding two sets of cross-process scoring calls - one before downloading JSON, and one after, is also more complicated to implement than just the after-JSON ones, but that's not the real concern here). V8 is not really designed or optimized for scripts that are loaded, run once, and then immediately discarded - I assume this is the case for Javascript engines in general, though that's not an area I have any expertise in.
Thanks, this makes sense.
The sparse dot product sounds like a promising direction, we would use it if it was supported. (I think it is likely that during the development we would also come across some iterative impovement ideas, like additional operations, special variables, etc.)
Adding a new language for filtering will increase even more the technical complexity of Fledge.
The easiest would be just allow custom implementations in some way and provide a way to get access to all information to buyers and sellers and then each implementer could choose its own implementation.
Maybe it could be solved now server side with the proposal to allow user defined function in the trusted server ?
Doing this server side would have many advantages because one could scale the servers depending on the computations that are done. Potentially it requires to be able to inject more signals in the trusted server as discussed in the last meeting from 31/08/2022.
There is significant latency impact from the overhead for starting worklets and setting up v8 contexts for
generateBid
andscoreAd
. On the other hand, a significant amount of DSP logic ingenerateBid
and SSP logic inscoreAd
is in enforcing various eligibility conditions: ensuring that ads meet publisher and policy requirements for the page, and that the publisher page meets ads' requirements. This means that frequently the API overhead is paid only to drop the interest group from the auction.If the browser could provide an API for directly filtering out ads and interest groups without incurring the expensive overhead, we could realize significant latency improvements. A limited API that does not execute arbitrary JS code should not require the same sandboxing and separate contexts. Note that suggestions in #302 have some overlap since they would provide a limited way of filtering interest groups from the trusted server response.
A fairly powerful way to specify the kinds of eligibility conditions mentioned above is in terms of logical operations on sets of tokens. These could be expressed in a small DSL or directly in terms of a JSON tree representation, e.g. this might represent a publisher requirement not to have ads with 'shoes' or 'sports' tokens:
Each ad would come with some classification into sets of relevant tokens, for which the tree could be evaluated to determine its eligibility. Similarly each ad may have some tree to be applied to the classification of the page. In practice, we expect ad techs to use opaque tokens (possibly numbers) to avoid unnecessary leaking of sensitive data. We suggest having 'AND', 'OR', 'ANDNOT', and 'ORNOT' as possible operators. The nodes field could contain subtrees following the same schema. We expect typical usage to be on the scale of 10s of tokens per ad.
We propose that the API provides a way of setting a filtering condition for each ad on the trusted server response and/or in the interest group object that will be applied to
perBuyerEligibilityTokens
provided in theauctionConfig
. The ads that are filtered out will be (temporarily) removed from the interestGroup input togenerateBid
atrunAdAuction
time.generateBid
will not be run for any interest group that has no eligible ads.Similarly, we suggest that SSPs have the same capability. The
auctionConfig
would have asellerEligibilityCondition
that would be applied to tokens provided by the trusted scoring server.Buyer Filtering Example
Let's illustrate how we expect this to work with an example. Here is a possible filtering tree for one ad:
which can be serialized to JSON as:
This tree would be returned from the trusted server response. A possible API: the filtering tree is returned from the trusted server with a special field that gives a map from renderUrl (as a way to identify the ad) to filtering tree:
If the responses for different trusted bidding keys contain conflicting conditions for the same renderUrl, then the browser is free to select any one.
Then in the call to runAdAuction, the buyer can provide tokens describing the page to be passed in the
auctionConfig
. For example, suppose the publisher page is in the US, discusses politics, and has to do with cars; a DSP might encode these observations via tokens 2, 7, and 9:In the above example, the ad would be eligible since token 7 is present, even though the left part of the tree does not evaluate to true since 4 is not present.
Seller Filtering Example
Seller filtering is similar, but the conditions are provided in the
auctionConfig
and classification of renderUrls into tokens are provided by the trusted seller server. Publisher requirements are passed in a tree inauctionConfig
:Here, we have a simpler tree, since we gave a more complicated example above.
Then, the
trustedScoringSignals
could include a specific field with SSP tokens describing the renderUrl:In this case the ad would be filtered out since token 14 is present.