firebase / firebase-admin-go

Firebase Admin Go SDK
Apache License 2.0
1.14k stars 247 forks source link

No way to respect RetryAfter for retryable errors #603

Open froodian opened 7 months ago

froodian commented 7 months ago

Describe your environment

Describe the problem

Some messaging errors are retryable, for instance errors for which messaging.IsQuotaExceeded(err) returns true. The FCM API requests that the Retry-After response header is respected in retrying these. The Firebase SDK reads the RetryAfter header and respects it when retrying internally, however it does not expose this value to callers, leaving them unable to respect this retry-after.

Steps to reproduce:

lahirumaramba commented 6 months ago

Hi @froodian , thanks for filing this feature request. We can't promise a timeline on this, but we will use this issue to track any progress.

Are you looking for a feature to disable auto retry in the SDK or would you prefer to access the error responses with retry-after headers keeping the auto-retry feature enabled?

froodian commented 6 months ago

Hi @lahirumaramba - thanks for the response.

In regards to the SDK's internal auto retry, we don't mind that behavior exisitng, but in order to keep requests within our existing timeout expectations, we've actually forked the repo and lowered MaxDelay and ExpBackoffFactor on our fork - so ideally those values would be configurable, or the auto retry would be disableable, so that we don't have to maintain that fork.

Regardless of the outcom there, we would want the ability to expose the Retry-After, so that we can retry any requests that make it through that auto retry at an appropriate time - since the initial filing of this issue, we've discovered this comment from a previous issue requesting this, and it appears we may be able to use this firebase.google.com/go/v4/errorutils package to accomplish that with today's code - we'll attempt that approach imminently.

muthu3107 commented 3 weeks ago

@lahirumaramba

We've set a 5-second timeout both at the context and HTTP request levels. However, we're observing that requests are still taking longer than 5 seconds, with seemingly random delays. We suspect the server is setting Retry-After headers due to potential spiking issues.

We're already implementing jitter and random backoff within our system to handle retries. It would be extremely helpful if we had an option to disable the Retry-After behavior altogether. This way, if a request times out after the configured 5 seconds, we can handle the failure gracefully with our own retry mechanism.

Please let us know if there's a way to achieve this or if there are alternative solutions to address the timeout issue.