Open TimLovellSmith opened 5 years ago
Could be same underlying issue as https://github.com/Azure/azure-libraries-for-net/issues/106 but with a different repro?
We're doing list all in resource group, on one resource group. I can imagine how that might get rate limited or just stressful and eventually turn into 503s... (though I am not really sure that that is what happened)
Anyone find any solution for it, I am also facing the same issues.
@JSGund can you please provide more details on the version/the method you are using? Thanks.
@TimLovellSmith are you still facing issue related 'TaskCanceledException'?
Yes, sort of. We have a ridiculous-looking workaround. We catch the exception - and then we check if we really tried to cancel the token that we passed to the API, before allowing the exception to be treated as a true cancellation. Otherwise we assume the request just timed out.
@TimLovellSmith could you provide more info about your app?
Based on more experience - it seems like its something that happens pretty easily any time we do enough (e.g. hundreds) of operations in parallel. And over time, it seemed like it happens no matter what HTTP operation we were doing or on what resource type, we have seen just about every possible combination.
We're now on a mix of versions of fluent SDKs, I haven't really been keeping track of whether it is still happening for all of them.
No, I don't think we bind to dynamic IPs...?
<package id="Microsoft.Azure.Management.Compute.Fluent" version="1.13.0" targetFramework="net462" />
<package id="Microsoft.Azure.Management.Dns.Fluent" version="1.15.1" targetFramework="net462" />
<package id="Microsoft.Azure.Management.Graph.RBAC.Fluent" version="1.17.0" targetFramework="net462" />
<package id="Microsoft.Azure.Management.KeyVault.Fluent" version="1.10.0" targetFramework="net462" />
<package id="Microsoft.Azure.Management.Msi.Fluent" version="1.17.0" targetFramework="net462" />
<package id="Microsoft.Azure.Management.Network.Fluent" version="1.17.0" targetFramework="net462" />
<package id="Microsoft.Azure.Management.ResourceManager.Fluent" version="1.17.0" targetFramework="net462" />
<package id="Microsoft.Azure.Management.Storage" version="7.1.0-preview" targetFramework="net452" />
<package id="Microsoft.Azure.Management.Storage.Fluent" version="1.17.0" targetFramework="net462" />
<package id="Microsoft.Rest.ClientRuntime" version="2.3.19" targetFramework="net462" />
<package id="Microsoft.Rest.ClientRuntime.Azure" version="3.3.16" targetFramework="net462" />
<package id="Microsoft.Rest.ClientRuntime.Azure.Authentication" version="2.4.0" targetFramework="net462" />
@TimLovellSmith In your case, it's more like timeout issue, please refer this link for a ongoing discussion about TaskCanceledException if it helps
@TimLovellSmith As this issue persists on your side, could you please provide more detailed logs so we could know what happened inside during the operations running. And I noticed your app still runs on old SDK versions. Have you got chance to try upgrade with latest Fluent SDK to see whether it persists? Thanks.
I think you are right that this is caused by the behavior of HttpClient. I would suggest you do one of two things:
"Only the requesting object can issue the cancellation request, and each listener is responsible for noticing the request and responding to it in an appropriate and timely manner."
"Requesting is distinct from listening. An object that invokes a cancelable operation can control when (if ever) cancellation is requested."
=> (bottom line, if I didn't cancel the task, it shouldn't be canceled!)
(Community, if you agree TimeoutException is better please upvote issue, if you think TaskCanceledExceptions are great and should be expected all the time even when you didn't request them, please downvote...)
I agree with @TimLovellSmith.
I need to distinguis if call was Cancelled or TimeOut because:
I don't want to check on thousand places if cancellation token was cancelled or not. Why we have TimeoutException then?
When analysing logs, I am not too much interested in TaskCancelledExceptions, because in web appliction TaskCancelledExceptions are quite common as the WebRequest can be can cancelled by hitting F5 (refresh) or Esc in browser. On the other way I am quite interested in TimeOutException, because it means that the system does not work properly.
TimeOutExceptions are already used in another parts of .NET Framework, so it is wiselly to keep consistency.
If a library use dependecy with wrong design, it is not good to leak this wrong design out of the API but wrap it or find a workarround.
@TimLovellSmith @qub1n Here are two options for you to set in your code and test if you still face the same issue.
var azure = Microsoft.Azure.Management.Fluent.Azure.Configure()
.WithLogLevel(HttpLoggingDelegatingHandler.Level.BodyAndHeaders)
.WithRetryPolicy(new RetryPolicy(new HttpStatusCodeErrorDetectionStrategy(), 3))
.Authenticate(azureCredentials)
.WithDefaultSubscription();
Please check details here for using RetryPolicy
var rm = azure.ManagementClients
.Where(c => c is IResourceManagementClient)
.DefaultIfEmpty(null)
.FirstOrDefault() as IResourceManagementClient;
(rm as IAzureClient).HttpClient.Timeout = TimeSpan.FromMinutes(5);
Please feel free to let me know if these help.
In version 1.15.1
Oh the dreaded TaskCanceledException! What I have seen so many times now is that I am trying to run a bunch of functional tests in parallel. They all try to use the SDK in parallel, and then they all blow up like this:
Message: System.Threading.Tasks.TaskCanceledException : A task was canceled.
Result StackTrace:d11.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.Management.Redis.Fluent.RedisOperationsExtensions.d 6.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiterd__4.MoveNext() in C:\repo\cp_psprod\test\FunctionalTests\ProvisioningTests\Tests\TestBase.cs:line 40
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at ProvisioningTests.Tests.ScaleUpDownTests.d__1.MoveNext() in C:\repo\cp_psprod\test\FunctionalTests\ProvisioningTests\Tests\ScaleUpDownTests.cs:line 21
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Azure.Management.Redis.Fluent.RedisOperations.
1.GetResult() at ProvisioningTests.IRedisCachesExtensions.<GetAsync>d__0.MoveNext() in C:\repo\cp_psprod\test\FunctionalTests\ProvisioningTests\IRedisCachesExtensions.cs:line 17 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter
1.GetResult() at ProvisioningTests.Tests.TestBase.