PlagueHO / CosmosDB

PowerShell Module for working with Azure Cosmos DB databases, collections, documents, attachments, offers, users, permissions, triggers, stored procedures and user defined functions.
http://dscottraynsford.com
MIT License
152 stars 46 forks source link

Enable support for retrycount and retryinterval #442

Open Szeraax opened 2 years ago

Szeraax commented 2 years ago

Instead of having to watch for transient errors, it would be nice to tell this cmdlet to retry a few times like what irm and iwr do.

PlagueHO commented 2 years ago

Good idea @Szeraax. This could be handled in a similar way way to how 429's are handled: https://github.com/PlagueHO/CosmosDB#how-to-handle-exceeding-provisioned-throughput

Are you talking about specific error types (e.g. timeouts)?

Szeraax commented 2 years ago

PowerShell used error codes 400-599 as their criteria when they put retry logic in Invoke-WebRequest and irm.

It looks like your implementation for 429 retries will retry indefinitely. That would not be desirable for general errors that could be bad server status, bad content, bad route, etc.

My suggestion is doing the same thing as Invoke-WebRequest where it will retry IF AND ONLY IF the -MaximumRetries param is -gt 1. And set a sane default value for param -RetryIntervalSec like iwr does.

The only thing that I'm not really sure on is how you can set these params generally for the module instead of having to add them as params to ever cmdlet. Maybe that's just the best way to go? Does this module store any singleton config that you could set a module wide retry count and interval on?

One other thought: In the future, will this module only target pwsh 7+? Or will it always be ~3-5 as well? When this module requires pwsh 7+, you can just pass through those params straight to iwr. Would you want to tag an iwr for pwsh >= 6 and one for <6?

If I get some feedback on what stuff you are most interested in, I'd be happy to start the PR and give it a first pass.

PlagueHO commented 2 years ago

The New-CosmosDbBackoffPolicy won't retry indefinitely if you specify a -MaxRetries (although I should have named it -MayRetry - which at some point I'll alias). It does also implement support for exponential retry policy and deals with the x-ms-retry-after-ms response. But agree that the 429 specific implementation might not be suitable here.

As for backwards compatibility, I haven't put much thought into dropping support for 5.1. But my thinking is that it wouldn't be too difficult to simply call out to a proxy Invoke-WebRequestWithRetry when executed on 5.1 and just passes the parameters through in 6 and above. I don't bother supporting below 5.1 though (although I'd expect it works on 5.0 and possibly 4.0).

But I don't see too much of an issue implementing this manually when on 5.1 and just using passthrough for 6 and above. It will be a non-trivial amount of work adding the parameters to each cmdlet that calls Invoke-CosmosDb... and ensuring test coverage for each - but that's just time and doesn't require a lot of thought.

The work would be adding the parameters to each function, adding the docs to the PlatyPs files and then adding unit tests for each function (to verify the parameters are being passed through).

But it could probably just be done for the *-CosmosDbDocument* functions first and added to other functions later - as I'd expect these are the most important ones.

Would love a PR - as I'm super snowed under at the moment.

PlagueHO commented 2 years ago

One thing I'd also add is that the 429 back off policy method would still need to work as is, because of the need to deal with the x-ms-retry-after-ms - which IWR won't do. So, if IWR just treats the 429 as an "just retry" situation then the IWR parameters should be ignored (or an Invalid Parameter exception thrown) - because we wouldn't want the default retry process for IWR to kick in. Does that make sense?

Szeraax commented 2 years ago

Yes, definitely best to go most specific to least specific error in the catch block.