Azure / azure-powershell

Microsoft Azure PowerShell
Other
4.22k stars 3.83k forks source link

Customize HTTP timeout / retries #13868

Open fawohlsc opened 3 years ago

fawohlsc commented 3 years ago

Description of the new feature

It would be good if users can customize both the HTTP timeout and retries. Both seems to be fixed at the moment.

Proposed implementation details (optional)

It would be good if the implementation is aligned to Inside the Azure SDK Architecture. For instance by exposing the RetryOptions of the underlying .NET SDK.

dingmeng-xue commented 3 years ago

@fawohlsc , could you explain further which case needs user to set timeout/retries? Our feeling is it is too technical to end user and it's not good approach to ask customer tune the number.

fawohlsc commented 3 years ago

@dingmeng-xue. First of all, appreciate your prompt response.

The use case is to reduce transient errors in scheduled and long-running scripts i.e. Pester tests (e.g. fawohlsc/azure-policy-testing). My Pester tests run for 2-3 hours and I discovered transient errors due to the HTTP timeout limit of 100s being exceeded. This transient errors could be easily eliminated by changing the HTTP timeout. For now, I am forced to retry the failing Pester test.

Also for consistency reasons it makes sense to expose the possibility to change the default timeout to the end user. For instance the Azure .NET SDK allows users to change the HTTP timeout - in Azure PowerShell you cannot do this for the time being. I understand your concern that this might be too technical, but if you keep your reasonable timeout of 100s and allow users amending it when needed should cover that concern.

dingmeng-xue commented 3 years ago

we will consider requirement #13520 with this together. Both plan to impact HTTP pipeline configuration.

fawohlsc commented 3 years ago

Thank you @dingmeng-xue!

jonarmstrong commented 3 years ago

Having this feature would be very helpful for me as well. I'm currently running into an issue where the New-AzResourceGroupDeployment command fails on a single deployment of mine. The template is quite extensive and it takes the validation steps ~2 minutes to run, which exceeds the connection timeout, so the resource group deployment never goes into submission. I've verified in the Azure Activity Logs that the deployment is received and successfully validates. In my local console, I get the following chain of exceptions

11:26:38 AM - Error: Code=; Message=The request was canceled due to the configured
                       HttpClient.Timeout of 100 seconds elapsing.
                       , 11:26:38 AM - Error: Code=; Message=A task was canceled.
                       , 11:26:38 AM - Error: Code=; Message=A task was canceled.
SenthuranSivananthan commented 2 years ago

It will be helpful to add retry policies such as exponential back off, linear retries, etc. to handle transient failures. At the moment, any timeout is an out right failure even when ARM is continuing to deploy the resources in the background.

alexangas commented 1 year ago

Would like this as well please. Just hit a timeout in Set-AzSqlElasticPool when setting a maintenance window. The Azure Portal states this could take several minutes. A timeout of 100 seconds is not going to be enough. So I'd like to be able to configure it.

jollylollylion commented 1 year ago

Also having this issue with get-azsqldatabase and get-azsqlserver and other azure cmdlets including new-azsqldatabasecopy used in our DevOps Pipeline. What is worse it's an intermittent issue, causing builds to randomly fail. Quite Frustrating.

[error]The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.

paulanguiano commented 1 year ago

We have the same problem with intermittent failures in automation. It's not practical to add manual retry loops on every single call, but without better timeout control it makes any kind of infrastructure maintenance written in powershell unreliable. This really seems like an oversight. We're close to writing our own bindings for powershell.

plk commented 1 year ago

Same issue - not being able to configure timeouts like this makes PS automation basically useless (i.e. unreliable) for any Enterprise situation.

dgiesselbach commented 1 year ago

Need this too! Some Az Modules need more time as 100 secs. For example mass deployment parts. Have parameters already been built for it? @dingmeng-xue

Alex-wdy commented 1 year ago

Need this too! Some Az Modules need more time as 100 secs. For example mass deployment parts. Have parameters already been built for it? @dingmeng-xue

We recently re-evaluated this requirement and will continue to update this issue.

cblomart commented 9 months ago

Got long running script parsing my Azure with many resources... got transient timeout issues sometimes followed by SSL connections errors.

Retries and Timeout adjustment would be really welcome to let those long running scripts go to their end without discrepancies.