Open ccharnay67 opened 9 months ago
@ccharnay67 so far I've generally it seen it not cache at all if I don't give a cache-directive, and it seems to respect this one pretty reliably:
res.setHeader('Cache-Control', 'public, max-age=300');
Maybe worth a shot.
@thegreatfatzby Thanks for your answer. We used not to set the 'Cache-Control' header at all, and we were observing trends that made us think caching was happening, that's why we tried to explicitly set it to 'no-store', but we're still seeing the same trend.
Currently, bidder scripts obey standard HTTP caching semantics, and are always fetched from the bidder's 1P network partition (Which means we basically have a single global cache - that's a leak we'll need to fix, but it should increase cache hit rate when HTTP caching semantic allow it, since normally each top-level site has an entirely different network cache partition).
It should fully respect the standard "cache-control" directives. If that doesn't seem to be happening, we'd be happy to look at chrome://net-export logs of this happening. Instructions: https://www.chromium.org/for-testers/providing-network-details/ (Note that you need to start logging in another tab before starting an auction). Logs don't include cookies, credentials, or response bodies, but they do include requests, response, URLs, IPs, and HTTP headers (with cookies and HTTP auth redacted).
Thank you for your answer @MattMenke2.
As I said, what we observe, thanks to forDebuggingOnly, is that in some cases the bidding script used in a certain bid was fetched from our endpoint quite a long amount of time before the call to our key-value server. In about 9% of our forDebuggingOnly hits, it's more than a minute old, and in around 1% it's over an hour old, which we are really not sure why it is happening.
It is difficult to reproduce locally, and an issue with caching sounded like the most obvious suspect. However, if you have other possible explanations, please don't hesitate. In the meantime, as you suggested, I will try to see if I can get lucky in reproducing the issue locally.
If a device goes into suspend mode in the middle of an auction, an hour old is possible, even with correct caching headers. 1% sounds a lot, though no idea about Android suspend/resume usage patterns (or those on desktop, for that matter).
@ccharnay67 Do you have a sense of whether these very-old bidding JS are happening more on mobile vs desktop? Also, when you see a report of very-old bidding JS, can you tell if that same bidding was also involved in an earlier auction? (That is, is this always reuse of your old JS, or is it also first use?)
Hello @michaelkleber, as for your questions:
We are now thinking the suggestion from @MattMenke2 is correct, it may be related to suspend/restart mechanism.
As far as we understand, the calls to fetch the bidding script, contextual call to fetch the perBuyerSignals, and to fetch the trusted bidding signals, are all parallel, the perBuyerSignals as a JS promise, the other 2 called by Chrome internally. Does that mean one could be resolved early (fetching the bidding script) while the others would be suspended and restarted later? Possibly if the user changes tab in the middle of the auction, which is more likely to happen on desktop? What happens when the perBuyerSignals promise gets suspended and then restarted? Does it restart the whole auction from the beginning?
More generally, do you have documentation on what in the auction runs in parallel, what waits for what, and what happens when some parts of the auction are suspended and restarted, either the promise on perBuyerSignals, or the fetching of the script or trustedBiddingSignals?
If signals fetching was suspended, we reckon it would be good to have a way to know at bidding time, since this could affect our decision to bid or not. Could it be considered to pass this information as input to generateBid?
https://github.com/WICG/turtledove/pull/906 goes into some of the details of how auctions are run.
Worklets may be reused across auctions in the same tab, if a worklet process for the same bidder script is still alive when a new auction starts (Or if two auctions are run at once - though in the latter case, they'll often share bidding signals fetches as well, but it's not guaranteed, due to raciness).
We do not tear down any fledge objects during suspend mode, so if we've downloaded, say, the bidder scripts, signals, and seller script, but are in the middle of running generateBid(), then generateBid() will complete as normal, and we'll then fetch seller signals. Entering suspend mode does potentially cause any live network requests to fail - in that case, we'll do whatever those requests failing when not entering suspend mode would cause to happen (scripts failing to load cause all the relevant scoreAd() / generateBid() calls to effectively fail. If signals fail, we'll currently just run the scripts but with null signal fields).
Changing tabs does not pause running auctions, only suspending all of Chrome will do that. This is how Chrome behaves on desktop in general (well, there may be more recent fancy stuff to deprioritize background tabs, but none of that affects FLEDGE, since it's not running in a renderer, apart perhaps from resolving JS promises).
We never restart auctions, nor do we have a notion of suspending them.
Thanks for the details,
So, if I understand correctly the part on executor reuse, a given executor would only fetch the buyer bidding script from the javascript URL only once?
I understand all executors share the same global HTTP, cache, so if the bidding script is cached from a single URL, all executors would reuse the same script. The "no-store" cache-control directive should prevent our script being reused across executors.
But is there a case where an executor would survive long enough to run several auctions an hour or a day later, without fetching the script again? E.g. the executor would survive on tab suspension, and be used again much later when the tab restarts without getting the script from the javascript URL again?
Thanks for the details,
So, if I understand correctly the part on executor reuse, a given executor would only fetch the buyer bidding script from the javascript URL only once?
Correct - as long as an executor is in memory, it won't fetch a script again. Note that executors are unloaded the instant they're no longer needed for an auction, and auctions are relatively fast.
I understand all executors share the same global HTTP, cache, so if the bidding script is cached from a single URL, all executors would reuse the same script. The "no-store" cache-control directive should prevent our script being reused across executors.
The "global" HTTP cache is sharded by top-level schemeful-site and that of the frame as well. Bidding scripts currently bypass that to use the bidding script site for both fields (which is something that needs to be changed, at some point). That aside, all executors would re-fetch the script from the shared network partition on creation, obeying all HTTP-cache directives, so yes, no-store would prevent reuse across executors (of which there's only one per frame/buyer origin/buyer script URL/bidding signals URL at a time). Note that navigating a frame creates a logically different "frame" for the purposes of this logic. I'm not sure whether reloading a frame would be considered the same frame or not.
But is there a case where an executor would survive long enough to run several auctions an hour or a day later, without fetching the script again? E.g. the executor would survive on tab suspension, and be used again much later when the tab restarts without getting the script from the javascript URL again?
Executors are unloaded the instant no auction needs them. If you enter suspend mode for a week while there's a running auction, then leave suspect mode and instantly run another auction, you could theoretically end up reuse an executor. It's unlikely a single frame will continuously run auctions to keep an executor alive, in the general case. Even continuously running auctions serially, but only one-at-a-time, won't keep the bidder executor alive (It's possible the seller worklet will be kept alive, depending on a race between running the seller reporting script for one auction and starting the bidder logic on the next). And again, that's only within one frame.
Hello @MattMenke2, thanks for your answers.
For our release cycle, we need our perBuyerSignals, trustedBiddingSignals, and bidding script to be aligned. We want to be fully in control of the caching: if we release a new script, we need the old scripts stored client-side to expire straight away, which is why we set the caching directive to no-store. We chose not to support backwards compatibility of our script at the moment, because we always release the three elements simultaneously. If some scripts, even a small amount, are lying around for over a week, like we see in some cases, this is an issue for us.
If you enter suspend mode for a week while there's a running auction, then leave suspect mode and instantly run another auction, you could theoretically end up reuse an executor. It's unlikely a single frame will continuously run auctions to keep an executor alive, in the general case.
So, from what you say, it could be happening. Would there be a way to ensure bidding scripts are never reused across auctions, even inside an executor? Like for instance by always fetching the script in runAdAuction?
There's currently no way to do that (I guess you could technically put your script in your trusted signals file, and eval it, but that seems not a great option)
Hello,
We observe that, in some cases, the bidding script used to run the Protected Audience auction is much older than the contextual bid request, or the call to the key-value server. Our estimate is that, for between 0.5% and 1% of our records, the bidding script was generated by a call to the bidding URL more than an hour prior to the bid request. In some rare cases, the script is older than a day.
The script is not often changed, but when it is, the older version of the script seems to hang around for quite a while, and this can cause issues for us.
We thought there could be some caching mechanism in action, so we set the CacheControl HTTP header of the response from the bidding URL to “no-store”, which we thought would prevent script caching from happening, but we still observe the phenomenon in the same proportions.
Is this an expected behaviour? Do you have any input or feedback on this?
Thank you.