Open ccharnay67 opened 8 months ago
Hi @ccharnay67, thanks for raising this! We are evaluating the idea, but wanted to ask whether supporting only logarithmic scaling (with pre-specified parameters) would be sufficient for this use case.
Adding support for a generic mapping is challenging. It would likely require spinning up a new javascript environment for each bidder at the end of the auction, which could have a significant performance impact. However, we could instead consider extending the existing linear scale/offset approach to support additional transformations (e.g. logarithmic). This does risk increasing complexity, but should have minimal performance impacts. This could maybe look something like your proposal, but with postprocess
taking an enum instead of a generic mapping function (e.g. "linear"
(default) or "log_2"
). We might also need to add a mechanism to clamp the result of this to a reasonable range.
You mention that there might be other non-linear scalings that could be useful here. If you could provide any more detail, that would be very helpful for understanding the requirements here.
Hi @alexmturner, thanks for your answer!
The logarithmic scaling could be enough for our use case. Clamping is a necessity in my opinion, for timings metrics we do not know how high the value we get could be, which makes partitioning the bucket space a bit hazardous, as there is always a risk of bucket overlap.
In terms of other non-linear scalings, I can imagine a use case with the winning-bid or highest-scoring-other-bid requiring bucketization following a non-linear distribution, and neither linear nor logarithmic. We ourselves sometimes use a [1, 2, 5, 10, 20, 50, 100, ...]-style bucketization because it is convenient.
As a middle ground, do you think it would be possible to have an array of thresholds as a parameter, to give the user flexibility to bucketize browser-defined signals? In the example we gave, we would pass [1, 2, 4, 8, 16, 32, 64, 128, 254, 512, 1024], an array of 11 thresholds which we would expect to correspond to 12 buckets, with boundaries defined by the threshold. As a generalization, a list of N thresholds would give N+1 buckets. We do not have a strong opinion on whether a threshold value should fall in the bucket immediately lower or immediately greater.
function generateBid(...) {
privateAggregation.contributeToHistogramOnEvent(
"reserved.win",
{
bucket: {
baseValue: "script-run-time",
offset: 500n,
thresholds: [1, 2, 4, 8, 16, 32, 64, 128, 254, 512, 1024]
},
value: 1
});
return bid;
}
Hello @alexmturner,
We're coming back to you on this thread because the proposed feature of letting the buyer define themselves the thresholds could be very interesting in the context of bid shading (https://github.com/WICG/turtledove/issues/930):
Can you share your thoughts on this proposal?
Hello,
From the documentation on extended PA reporting, when using browser-defined signals to calculate the bucket or the value for reporting, all we can do as post-processing is applying scale and offset.
We have a use-case, for timing signals like script-run-time, where we would like to use a non-linear bucketized timing as bucket. At the moment, we could do buckets of width 5ms by using scale 0.2, but we cannot create a bucket for [0-1ms[, another for [1-2ms[, then [2-4ms[, [4-8ms[, …
With the current state of the API, the only way we found to do it was to reserve 1000s of buckets, one per ms of timing up to a certain value, and do the logarithmic bucketization on our end. This is very inconvenient, as it forces us to block a bigger part of the keyspace than needed, only because we cannot post-process the Chrome internal metrics.
We are considering logarithmic buckets in this example because it makes sense for timings, but it would be good to be able to provide a function as a post-processing callback, which would take the value returned by signal as input and would return the actual bucket/value we want. This could still be combined with scale and offset as follows: bucket = postprocess(inputSignal) * scale + offset. This definition avoids issues with backwards compatibility, as scale default is 1.0 and default offset is 0. Below is an example of what it could look like.
What do you think?