WICG / trust-token-api

Trust Token API
https://wicg.github.io/trust-token-api/
Other
413 stars 82 forks source link

Attribution of PST error messages #299

Open abhisagw opened 2 months ago

abhisagw commented 2 months ago

We are observing failures to issue token on Chrome clients with the following error message (we only have access to error strings from JavaScript error logs and not the details from PST tab under issuance call details in network tab):

OperationError: Error executing Trust Tokens operation

Based on above, we have the following queries:

  1. What are the scenarios in which the above error messages gets triggered? We have seen these errors in scenarios where server returns invalid token in response, or issuer key commitment is not registered but it would be helpful to have an exhaustive list of scenarios where this error can be thrown.
  2. Can there be scenarios where key commitment distribution across clients doesn't happen successfully? Are there any metrics/numbers that your team tracks for successful update of key commitments across supported Chrome clients ?
  3. Is there a way we can get more details from the exception thrown in case of such Operation errors? The exception string logs the above message which seems very generic in nature in terms of root causing the issue (Is there a way to capture error message shown under network tab which is slightly more descriptive?)

Note: We have successfully registered PST public keys through this Google Chrome repository and haven't rotated them since registration.

dvorak42 commented 2 months ago
  1. Generally this happens when the server is returning a value that the client is unable to parse. This may also happen when rate limits on how often redemptions occur on a page, though I don't believe that affects issuances. Missing keys will throw a "DOMException: No keys currently available for PST issuer. Issuer may need to register their key commitments." Though if you've manually provided keys that don't parse correctly or mismatch with what the server is using, that would also appear to show up as a server error.

  2. There can be a delay of up to 4 hours for clients that haven't been used in a while and haven't updated their configurations. You can check that the latest keys are available by going to "chrome://components/" and checking the "Trust Token Key Commitments" field. As of April 26, 2024, the most up to date version is "2024.3.25.1". (if you're manually testing, you can also hit "Check for update" to force a refresh).

  3. For some error cases, you can record a NetLog (https://www.chromium.org/for-testers/providing-network-details/) while performing the operations you're trying to test and then use https://netlog-viewer.appspot.com/#import to load it. Its a bit difficult to use for external developers, but for PST issues, if you load a log and hit "Events" and search for "TRUST_TOKEN_OPERATION", that should show the associated events and may have slightly more information about why certain operations are failing.

What PST issuer registration did you register as?

abhisagw commented 2 months ago
  1. Can we then attribute all such OperationError to server error other than for the case when key mismatch/parsing issue happens? I was able to reproduce this error for a scenario where the issuance call exceeded the 2 issuer per TLD limit.
  2. Can there be cases where for certain clients key commitment is never propagated? Are there any success/fail metrics tracked for key commitment propagation to client browsers?
  3. Thanks this is helpful, is there a way to capture this information on client side (through JavaScript or any other automated means)

We registered through this issue.

dvorak42 commented 2 months ago
  1. Ah yeah, forgot we also apply the "redemption" rate limit for the issuer issuance limit. @aykutbulut do you think it would be reasonable to move hitting the issuer limit to a different error code? It looks like a kResourceExhausted error exists but gets mapped to the default rather than a different text string.

  2. There can be edge cases where they don't get propagated (non-Chrome Chromium versions, enterprise environments that block or limit the component fetch, and out of date Chromes with the potential 4 hour delay (though if a computer is on standby or asleep for longer, you can see even more out-of-date Chromes). We don't have metrics for individual components, but the overall metrics for all component downloads is roughly 93% of clients will have up-to-date versions.

  3. As the exact failure conditions can be sensitive (revealing the specific issuers that are in the 2 issuer limit, timing, and differentiating between running out of tokens vs hitting the limits in other ways), we don't have a way to get it from the client remotely via JS. If you have a local client, it might be possible for us to float more information in the DevTools Network tab, and then you could make an extension to grab data from that locally.

abhisagw commented 2 months ago
  1. Thanks, it would be helpful for us in attributing errors if we have different error code for this scenario and potentially others if they are mapped to default as well. Would be really helpful if you could help us with when these changes will be available to use (considering you go ahead with the requirement) Also, we are seeing this error (OperationError: Error executing Trust Tokens operation) for a few requests and we are sure its not because of TLD issuance limit or server errors and since key commitment issues will not result in this error, can you help us identify what other scenarios could result in this error?
  2. Is it possible to get a breakdown for just Chrome (excluding Chromium and other similar browsers)?
  3. It would be helpful if there was a way to get descriptive error messages, since its not possible to manually debug or re-create a plethora of client scenarios. Floating more information through DevTools would definitely be helpful but only if we are able to recreate client scenarios locally and having informative error strings would help with that
abhisagw commented 2 months ago
  1. Regarding the DOMException: Error executing Trust Tokens operation, I can also see this error message for cases when issuance limit of 500 is breached. Based on this and information in previous responses, seems like this is a generic error message being mapped to multiple failure scenarios. This will potentially hamper debugging since no other information is available from the exception thrown to narrow down on the issuance failure reason. Would request to map these failures in a way that exception thrown carries more information regarding the failure.
dvorak42 commented 2 months ago
  1. Yeah, sorry to clarify the comment in the previous response meant that I think all the rate limits return that error code. I think for some of them we can move it to a bespoke error code, though any limits/errors aggregated across multiple issuers may need to be the generic code to avoid it being a side channel.

  2. I believe the 93% number is for Chrome (since those are the browsers we have metrics for).

abhisagw commented 1 month ago
  1. Thanks, do we need to raise a request/issue to move to different error codes/description?
  2. So can we safely assume that 7% of chrome browsers(which have support for PST) won't have the correct key commitment downloaded?
dvorak42 commented 1 month ago
  1. @aykutbulut Can we make a Chromium bug to track this.
  2. That's a worst case upper bound for having the latest component. If the key commitments have been around for more than a day, clients having a few hours/days out of date component will generally still have the correct key commitments downloaded. I'd guess 24 hours drops this down to 1% or even less, but we don't have non-point in time metrics.
aykutbulut commented 1 month ago

Created https://issues.chromium.org/issues/339207243 to track this.

dvorak42 commented 1 month ago

For 1, we've added a few more error messages into the console based on whether the error comes from running into ratelimits versus bad server responses. It should be rolling out to Canary today and Dev over the next week.

abhisagw commented 2 weeks ago

@dvorak42 what is the status of this change in Chrome release cycle? Is there a way we can track it on our end?

dvorak42 commented 1 week ago

It should be in Chrome Canary/Dev, with it rolling out as part of Chrome 127.