WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
522 stars 225 forks source link

Accessing a timestamp from bidding worklet #1106

Open alexeiverbny opened 5 months ago

alexeiverbny commented 5 months ago

Hello,

We would like to have access to a current timestamp inside the bidding worklet. One possible approach would be to make the Date object available inside the worklet but maybe there are other ways that I am not aware of. The reason for this is that we need a way to verify that perBuyerSignals is not being cached by the browser. Current data suggests that as much as 20% of our impressions had cached perBuyerSignals at bid time. If we cannot verify that perBuyerSignals is fresh at bid time then we cannot have confidence that the information in perBuyerSignals is accurate for the current bidding opportunity. While I cannot say exactly how this will impact our bid values, it is highly likely that this will depress our bid prices, especially on higher quality publishers.

Thank you, Alex

michaelkleber commented 5 months ago

We discussed a bunch of options in #431. The way that seems the most immune to any kind of corruption is to include a key called "time" in every Interest Group, and have your KV server return the current time in your trusted bidding signals. You can also include a "time" value in your perBuyerSignals, and you can compare the two to see if either one looks out of date.

alexeiverbny commented 5 months ago

Ah I will take a look at the issue and consider this approach. Thank you

alexeiverbny commented 5 months ago

Does google plan on supporting a time key in their KV server once it is ready? Without the user needing to push time values?

michaelkleber commented 5 months ago

Ah, interesting question! I don't recall any discussion of our KV server implementation handling special keys like that. Hey @peiwenhu has this come up before?

I believe it could be implemented using UDFs even if it's not built in.

alexeiverbny commented 5 months ago

Thanks. I am interested in what Peiwen has to say but it does seem like the UDF route would work for us.

michaelkleber commented 5 months ago

Hi @alexeiverbny, one more question for you based on a conversation in a recent WICG call.

Is there any chance that what you're seeing is the result of a component auction configuration that is being set once and then used multiple times? I wonder if there's a risk of this happening when the publisher page refreshes an ad slot.

I found a reference to this in the original GAM testing plans, with a paragraph about "update their previously provided auction configuration", and also a hint at a mechanism in the GPT documentation, where it says "If this value is set to null, any existing configuration for the specified configKey will be deleted."

None of this is specifically browser stuff; this feels more in the territory of the interaction between GAM and Prebid, which I'm not very knowledgeable about. Maybe @patmmccann can offer us some insight on how refreshes are supposed to work.

In any case, Alex, I do think the timestamp stuff that we discussed will help you avoid bidding in this situation. But surely it would be preferable for everyone if the situation never came up in the first place!

alexeiverbny commented 5 months ago

Hi @michaelkleber,

Are you hypothesizing that auctionConfig is not being set to NULL when a publisher page refreshes? I think that is a possible explanation. We dug into one particular example where we had stale perBuyerSignals and noticed that it was a site with frequent ad refreshes. auctionConfig not refreshing seems like a possible explanation.

The "timestamp from kv-server" will help us no bid in this situation, but yes, I agree that it would be preferable if this never came up. Both buyers and sellers would benefit from a refreshed config during ad refreshes.

Thanks, Alex

alexeiverbny commented 5 months ago

IIUC, this PR makes it so configs are not re-used by default: https://github.com/prebid/Prebid.js/pull/10930.

However, we do still see stale perBuyerSignals

michaelkleber commented 5 months ago

From reading that PR, it does sound like the old behavior was for configs to get reused across ad slot refreshes and the new behavior was to not reuse. But I don't know how to watch such behavior changes rolling out — seems like you would need to know what version of prebid each publisher has on their page?

So maybe what you're seeing is just a temporary thing that will be solved once all sites are using a later version of prebid. I'm sorry I can't really help with this, other than making some guesses about what's going on.

laurb9 commented 5 months ago

https://github.com/prebid/Prebid.js/pull/10930 was included in Prebid.js release 8.37.0 and after. It will take time for publishers to update their sites.

alexeiverbny commented 5 months ago

Thanks, @laurb9 .

peiwenhu commented 5 months ago

Hey sorry I'm a bit late to this discussion. Returning time or any special key has not been discussed before on our side.

There are 2 concerns:

  1. Our current understanding is that we cannot trust the absolute time a server gets when running inside TEE. Although there has not been too much demand on this matter so we didn't look too deep into this. If it turns out to be really the case then it can be too much complexity for the server. If it's done by the UDF and therefore out of the server framework's responsibility then it sounds more feasible.
  2. The Chrome team is rethinking caching of the TEE KV responses and the current plan is to cache the entire KV response as one entry, much like today's HTTP caching. In that case, if you want the uncached time, I think you may need to forgo caching altogether for the requests or some extra care need to be taken to separate time and the rest of the response. cc @MattMenke2 @JensenPaul
MattMenke2 commented 5 months ago

I think the time is one thing that we could safely exclude from the cache key, if we opted to include it in requests.

That aside, if the TEE wanted the time, couldn't the (untrusted) server that wraps the TEE provide the time? Having a putative web standard provide the time from the client, just because a particular implementation of a TEE can't provide an accurate time seems a bit strange to me.

alexeiverbny commented 5 months ago

I am not familiar with how today's HTTP caching works. With the current plan, how will we be able to get an uncached response inside generate_bid() every time? This is important for us not just for time, but for all keys that we return from the kv store

peiwenhu commented 5 months ago

I think the time is one thing that we could safely exclude from the cache key, if we opted to include it in requests.

So IIUC the cache would be keyed by all keys in the request except time (plus other dimensions)? And at request time the browser gets 1 cached entry for all other keys and makes 1 request for the time key to the KV and merges the cached entry and the fresh response into one response?

That aside, if the TEE wanted the time, couldn't the (untrusted) server that wraps the TEE provide the time? Having a putative web standard provide the time from the client, just because a particular implementation of a TEE can't provide an accurate time seems a bit strange to me.

What you have in mind here is what discussed above of using user defined functions to provide the time. Yes that might work. I was just explaining that other alternatives would be hard because the trusted part of the server logic can't provide the time because the time cannot be trusted and the server (or our team) doesn't want to provide something untrusted as part of its APIs since the recipients may think it's trusted. In this case the recipient (generateBid) isn't inside the trust boundary so it's not a big deal but tomorrow the time may be used by something within the trust boundary, and at that point it can be confusing.

I am not familiar with how today's HTTP caching works. With the current plan, how will we be able to get an uncached response inside generate_bid() every time? This is important for us not just for time, but for all keys that we return from the kv store

I think today you need to set the cache control header to tell the browser TTL of your key value data. For TEE KV server it'd also require the server to set the TTL somehow in the response.

alexeiverbny commented 5 months ago

Thanks Peiwen. I am reading up a bit on cache control headers https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control. So would we be able to set Cache-Control: no-cache in the response header of the TEE KV?

peiwenhu commented 5 months ago

Thanks Peiwen. I am reading up a bit on cache control headers https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control. So would we be able to set Cache-Control: no-cache in the response header of the TEE KV?

The cache control headers I think are what impacts the BYOS KV server, if that's what you're using today.

For TEE KV server, it'd not use this exact header but it'll provide a way.

MattMenke2 commented 5 months ago

Note that the KV server uses an extra layer of encryption for the request/response bodies, so it can't use HTTP caching semantics. BYOS uses query params and only uses HTTPS, so it can rely on the standard HTTP caching semantics.

laurb9 commented 5 months ago

Having the browser add the auction start timestamp with an acceptable resolution to browserSignals is more straightforward in my opinion. It sidesteps the Date shimming issue, works with no KV and exposes the same information as the KV reflecting client timestamps back to the client would, with more privacy control even.

bartek-siudeja commented 5 months ago

There is already join recency in generate bid, in milliseconds, it seems. So "hiding timestamp" is rather strange. Everyone can also include timestamp in ig-join, inside ig, just before calling join API. Then maybe sum join timestamp and join recency.

Set browserSignals["[recency](https://wicg.github.io/turtledove/#dom-biddingbrowsersignals-recency)"] to the [current wall time](https://w3c.github.io/hr-time/#dfn-current-wall-time) minus ig’s [join time](https://wicg.github.io/turtledove/#interest-group-join-time), in milliseconds.

It is a very ugly hack, and perhaps the worst way possible to get current time. Very bug-prone. But if someone is willing to do whatever it takes to violate privacy they probably will do this. Everyone else should not need to try to debug a sum of timestamp and recency, especially that logging anything from generate bid is rather tricky. KV store bouncing time back is actually even more complicated than this hack.

MattMenke2 commented 5 months ago

It's not access to any timestamp that we need to block, but access to a continuous high precision timer, to protect against side channels based on CPU usage. It's a high precision continuously updating time that we're concerned about, not knowing the general time an auction occurs.

Recency is rounded to 100 milliseconds (though not sure how much that one matters, even with group-by-origin mode, since it's one fixed time, calculated well in advance of when the scripts are run).

bartek-siudeja commented 5 months ago

This is exactly the point. Providing timestamp in seconds in browser signals, maybe even with random 500ms noise, would probably satisfy everyone "needing a reasonable timestamp" in generateBid. The same can be achieved using dependency on KV, or using sum with recency. But why make it so complicated for users.

michaelkleber commented 5 months ago

Please read through the discussion in #431 if you haven't already. Observe that your suggestion "Providing timestamp in seconds in browser signals" would satisfy one of the three uses mentioned there, and would fail at the other two. This is why, when we discussed it last year, we left it in the hands of the party that was actually planning to use the information.

droundy commented 5 months ago

@michaelkleber Can you clarify the three uses you see in #431? Are you referring to the comment by @pehuen-rodriguez in which he discusses day of week?

The issue in "just passing the timestamp into the bidder" is that we would like to use the timestamp to check the staleness of values that are passed to the bidder. Ideally, we'd like to check the staleness of both the per-buyer-signals and the the trusted bidding signals against both each other and the current time in order to make replay attacks just a little bit more challenging.

bartek-siudeja commented 5 months ago

I think timestamp in seconds actually satisfies all the cases in this discussion. Or at least I am not sure which one would fail. Day in user zone is maybe not easy to get, but storing offset from UTC in IG content, from ig join call time, is even easier than storing timestamp inside IG.

The goal is not so much to have super tricky bidding logic. A much bigger concern is contextual perBuyerSignals being stale for some reason (maybe a bug somewhere, including our own systems). Same with KV store responses, maybe because of caching. It is much safer to no bid by default eg. if KV store data is 10 seconds old, then to later try to resolve billing issues due to accidental spend. There is almost no way to control spend as is, and "spend by default in cached world" is a recipe for disaster, I believe.

michaelkleber commented 5 months ago

The PA auction could include a browser-provided timestamp, e.g. browserSignals could include a field like 'roundedDateNow': 1712622180000, current time (in milliseconds since the epoch) rounded to one minute.

  1. This would let you detect staleness of your signals, though you would get false positives on machines with incorrect clocks. (That is much less common today than 10 or even 5 years ago.)

  2. It would not make the functionality of the JavaScript Date object available — we don't have any way to do that and avoid the high-granularity timers that are a side-channel risk.

  3. It would not directly support "date-based decisions on the bidding function", e.g. time-of-day or day-of-week restrictions on ad buying. Those kinds of useful time conversions are also part of the Date object. Ad techs who wanted that kind of functionality would need to pass in some information about the user's timezone, and do date math themselves.

If problem 1 turns out to be something that will be fixed by Prebid.js release 8.37.0 (see https://github.com/WICG/turtledove/issues/1106#issuecomment-2038621742) and problem 3 is something buyers are likely to need to solve anyway, then roundedDateNow does not seem like a very useful feature. But if problem 1 is a substantial enough issue even aside from old Prebid versions that it warrants a new signal, then we can indeed provide it.

I just don't want to over-promise something that will disappoint most of the people who try to use it.

bartek-siudeja commented 5 months ago
  1. I am not sure I understand why "rounded to one minute". Is one second resolution too granular, is it crossing into the privacy risk territory? Because IG join recency (or KV) can probably be used to generate equally accurate timestamps (in a very roundabout way). We could also try to detect really bad clocks by taking browser provided timestamp and comparing to ig-join recency plus our server time just before join. Or even store server plus JS join times. Overall, 1 minute of staleness seems like eternity, for essentially uncontrolled bidding using possibly fraudulent (definitely untrusted and/or cached) signals.
  2. (and 3) I don't think anyone is asking for Date object functionality. Just utc timestamp (since epoch). We are bidding using wasm, but even if not it is not so hard to "copy" a part of the date conversion code, especially "approximately correctly". There is no super secret sauce in JS Date object. Every language has one, and only due to convenience this code is related to checking "now" time. As for time zone, we can store that in IG, essentially assuming browser stays in one zone, for example. Or maybe we can even update the zone using ig-update, maybe using Geo/Ip as a proxy. Or we could decide to operate on UTC time everywhere.

I don't think problem 1 is even about the prebid issue. What if there is another bug somewhere (maybe in one of the exchanges, or in our own code, wherever) 5 months from now. Or if there is some bad actor actually trying to replay/manipulate some data for 1 minute. Our goal is to have some way to quickly reject suspicious traffic, before we actually spend any money. Even if this means false positives and some lost traffic.

rdgordon-index commented 4 months ago

The PA auction could include a browser-provided timestamp

I think this is the intention of the ask -- rather than a full-fledged date object.

The way that seems the most immune to any kind of corruption is to include a key called "time" in every Interest Group, and have your KV server return the current time in your trusted bidding signals

While this might be viable for buyers, sellers have no such key to add -- so we're still very much dependent on the brower and/or API surface to provide some indication of "when" the on-device auction is taking place compared to when the auctionConfig was generated.