WICG / turtledove

TURTLEDOVE
https://wicg.github.io/turtledove/
Other
541 stars 238 forks source link

Explainer for PA per-participant metrics #1272

Open morlovich opened 2 months ago

morlovich commented 2 months ago

Please note that most of this isn't landed yet (which does mean that feedback is more actionable).

fhoering commented 3 weeks ago

@morlovich @alexmturner Thanks for those new metrics that would allow better debugging.

Having more metrics raises the importance of ticket https://github.com/WICG/turtledove/issues/1084 because currently for timings a large key space must be allocated and overlap of buckets are not handled. The issue talks about timings but potentially the same issue will arise for the percentage metrics added here depending on with which precision they are reported (int, float with n digits)

morlovich commented 3 weeks ago

A lot of these are bounded (...though some by configuration which may be under a different party's control). cumulative-buyer-time is bounded by configured limit plus 1000 (if there is no limit configured, it's 0). The percentages are bounded by... 110.

Hmm, I guess network times are technically not bounded, and maybe I should add that. The fetch itself does have a 30 second timeout, but there is some sloppiness in measurement. Similarly for script-run-time --- the actual execution has a timeout, but the measured time may be slightly

You do also have 128-bits of address space, however, so you can just give 2^64 metrics 2^64 space each. The values are initially measured as floats, and then after application of your scale and offset truncated.

fhoering commented 3 weeks ago

The percentages are bounded by... 110.

What is the unit and precision of the percentages (int vs float) ? Technically if values are close, 99.5 and 99.9% can make a difference.

The fetch itself does have a 30 second timeout,

Which network fetch exactly because it looks low to me in general ? Do you have the Chromium source code for that ?

Hmm, I guess network times are technically not bounded, and maybe I should add that.

The client should be able to decide on how to bound and bucketize them based on the use case. It should not be hardcoded.

You do also have 128-bits of address space, however, so you can just give 2^64 metrics 2^64 space each.

My understanding is that this is not how it works. For timers if I allocate 2^64 I cannot use this for something else. But I let @alexmturner confirm or explain why it would work.

morlovich commented 3 weeks ago

The percentages are bounded by... 110.

What is the unit and precision of the percentages (int vs float) ? Technically if values are close, 99.5 and 99.9% can make a difference.

They starts as a double. Then scale and offset are applied, and the result is converted to an int. So if you want more digits of precision, you can use a scale. e.g. without applying any you just end up with 99 and 99, but if you apply scale of 10 you will end up with ... 995 and either 999 or 998. Not sure exactly how the decimal ends up in binary.

The fetch itself does have a 30 second timeout,

Which network fetch exactly because it looks low to me in general ? Do you have the Chromium source code for that ?

scripts/wasm, trusted signals (might not be the case for them for V2 protocol, not sure).

https://source.chromium.org/chromium/chromium/src/+/main:content/services/auction_worklet/public/cpp/auction_downloader.cc;drc=be1dfd15d36d914a9feb33677a0c836c7922c689;l=253

You do also have 128-bits of address space, however, so you can just give 2^64 metrics 2^64 space each.

My understanding is that this is not how it works. For timers if I allocate 2^64 I cannot use this for something else. But I let @alexmturner confirm or explain why it would work.

Well, I mean you make first histogram have bucket offset 0, second have offset 0x100000000 , third 0x200000000, etc. (however one specifies hex for bigints in JS, anyway).