amplitude / Amplitude-Node

Server-side Node.js SDK for Amplitude
MIT License
67 stars 20 forks source link

feat(node): Add baseRetryTimeout option #48

Closed jooohhn closed 3 years ago

jooohhn commented 4 years ago

Summary

Checklist

kelvin-lu commented 4 years ago

How do you feel about an option that gives more control over the behavior but might be harder to understand? e.g.

timeoutConfig = [100, 100, 200, 200, 800] to wait 100 ms after the first and second errors, 200 after the third and fourth, etc.

the motivation is cuz as the # of retries increase in our current logic there's an exponential delay between retries and this can cause the process to hang if it gets to the end of the while loop

jooohhn commented 4 years ago

We can do that, the current default options is 10 retries.

Thoughts on removing DEFAULT_OPTIONS.maxRetries and DEFAULT_OPTIONS.baseRetryTimeout for DEFAULT_OPTIONS.timeoutConfig = [100, 100, 200, 200, 400, 400, 800, 800, 1000, 1000]?

kelvin-lu commented 4 years ago

I think so! I was thinking about if we want to decouple the max # of retries and the timeout timings, or if they should be one and the same.

one path we could take is that the max retries dictates how many times to try and the timeout config dictates how long to wait after each failed try (with some default if we reach the end of the array). Which would be easier to reason about for users but would be more to configure.

another option is to have only one option timeoutConfig where we retry until we reach the end of the array. Which would be more compact but harder to explain. Thoughts?

jooohhn commented 4 years ago

one path we could take is that the max retries dictates how many times to try and the timeout config dictates how long to wait after each failed try

Does this involve maxRetries: number and timeoutConfig: number? How/would users still granulate changes in wait times after multiple failures?

another option is to have only one option timeoutConfig where we retry until we reach the end of the array. Which would be more compact but harder to explain. Thoughts?

I think one option timeoutConfig: number[] may be be harder to understand, but we could explain it in the typescript docs. Something like "Determines the wait times for each subsequent failure and how many times event sending will retry"

kelvin-lu commented 4 years ago

Does this involve maxRetries: number and timeoutConfig: number? How/would users still granulate changes in wait times after multiple failures?

I think in both cases timeoutConfig exists as an Array<number> but in the first case the # of retries is tied to maxRetries and not the length of the array.

jooohhn commented 4 years ago

I think in both cases timeoutConfig exists as an Array but in the first case the # of retries is tied to maxRetries and not the length of the array.

In this case, how would we handle timeoutConfig.length and maxRetries not being the same number?

kelvin-lu commented 4 years ago

the case we'd have to worry about is if maxRetries > timeoutConfig.length - we could default to our own value or the last element in the timeoutConfig.

I think both approaches have their own confusions - if we only have a timeoutConfig, I would expect to do timeoutConfig.length + 1 tries (1 after the last wait) - then does passing in [] mean no retries or retry once? Does this mean we should also timeoutConfig to be set to null to explicitly stop retries?

If we have both maxRetries and timeoutConfig, it's easier to configure but also means knowing two options and their interactions.

jooohhn commented 4 years ago

All good things to consider!

I would expect to do timeoutConfig.length + 1 tries (1 after the last wait)

Could you explain on how retries === timeoutConfig.length + 1, I'm confused 😅. My current understanding in this case would be that the # retries is directly equal to the array length

E.g. If timeoutConfig === [], then 0 retries If timeoutConfig === [100], then 1 retry If timeoutConfig === [100,200] then 2 retries.

kelvin-lu commented 4 years ago

reiterating from a DM: The current method that we use "retries" is like so: try 1 -> wait time 1 -> try 2 -> wait time 2 -> ... -> wait time n -> try n + 1

which would be n+1 tries. Ideally we should rework it to be like:

wait time 1 -> try -> ... -> wait time n -> try n

Especially since the first "try" otherwise happens right after the first failed send. Thanks again @jooohhn for challenging an assumption I made 🤦