Azure / azure-libraries-for-net

Azure libraries for .Net
MIT License
380 stars 192 forks source link

'TaskCanceledException' shouldn't be thrown unless caller canceled the task! #532

Open TimLovellSmith opened 5 years ago

TimLovellSmith commented 5 years ago

In version 1.15.1

Oh the dreaded TaskCanceledException! What I have seen so many times now is that I am trying to run a bunch of functional tests in parallel. They all try to use the SDK in parallel, and then they all blow up like this:

var caches = await manager.Redis.ListByResourceGroupAsync(resourceGroup.Name); return caches.Where(c => c.ProvisioningState.Equals("succeeded", StringComparison.OrdinalIgnoreCase) && options.Matches(c, defaultRegion));

Message: System.Threading.Tasks.TaskCanceledException : A task was canceled.

Result StackTrace:
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Azure.Management.Redis.Fluent.RedisOperations.d11.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Azure.Management.Redis.Fluent.RedisOperationsExtensions.d6.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter1.GetResult() at ProvisioningTests.IRedisCachesExtensions.<GetAsync>d__0.MoveNext() in C:\repo\cp_psprod\test\FunctionalTests\ProvisioningTests\IRedisCachesExtensions.cs:line 17 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter1.GetResult() at ProvisioningTests.Tests.TestBase.d__4.MoveNext() in C:\repo\cp_psprod\test\FunctionalTests\ProvisioningTests\Tests\TestBase.cs:line 40 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult() at ProvisioningTests.Tests.ScaleUpDownTests.d__1.MoveNext() in C:\repo\cp_psprod\test\FunctionalTests\ProvisioningTests\Tests\ScaleUpDownTests.cs:line 21

TimLovellSmith commented 5 years ago

Could be same underlying issue as https://github.com/Azure/azure-libraries-for-net/issues/106 but with a different repro?

We're doing list all in resource group, on one resource group. I can imagine how that might get rate limited or just stressful and eventually turn into 503s... (though I am not really sure that that is what happened)

JSGund commented 5 years ago

Anyone find any solution for it, I am also facing the same issues.

xseeseesee commented 4 years ago

@JSGund can you please provide more details on the version/the method you are using? Thanks.

xseeseesee commented 4 years ago

@TimLovellSmith are you still facing issue related 'TaskCanceledException'?

TimLovellSmith commented 4 years ago

Yes, sort of. We have a ridiculous-looking workaround. We catch the exception - and then we check if we really tried to cancel the token that we passed to the API, before allowing the exception to be treated as a true cancellation. Otherwise we assume the request just timed out.

xseeseesee commented 4 years ago

@TimLovellSmith could you provide more info about your app?

  1. What environment your app is running on? Any components in your app bind to dynamic IPs?
  2. How often does the TaskCancelException come out? And what version of SDK are you using now?
  3. Does it only appear when you call list all operation?
TimLovellSmith commented 4 years ago

Based on more experience - it seems like its something that happens pretty easily any time we do enough (e.g. hundreds) of operations in parallel. And over time, it seemed like it happens no matter what HTTP operation we were doing or on what resource type, we have seen just about every possible combination.

We're now on a mix of versions of fluent SDKs, I haven't really been keeping track of whether it is still happening for all of them.

No, I don't think we bind to dynamic IPs...?

  <package id="Microsoft.Azure.Management.Compute.Fluent" version="1.13.0" targetFramework="net462" />
  <package id="Microsoft.Azure.Management.Dns.Fluent" version="1.15.1" targetFramework="net462" />
  <package id="Microsoft.Azure.Management.Graph.RBAC.Fluent" version="1.17.0" targetFramework="net462" />
  <package id="Microsoft.Azure.Management.KeyVault.Fluent" version="1.10.0" targetFramework="net462" />
  <package id="Microsoft.Azure.Management.Msi.Fluent" version="1.17.0" targetFramework="net462" />
  <package id="Microsoft.Azure.Management.Network.Fluent" version="1.17.0" targetFramework="net462" />
  <package id="Microsoft.Azure.Management.ResourceManager.Fluent" version="1.17.0" targetFramework="net462" />
  <package id="Microsoft.Azure.Management.Storage" version="7.1.0-preview" targetFramework="net452" />
  <package id="Microsoft.Azure.Management.Storage.Fluent" version="1.17.0" targetFramework="net462" />
  <package id="Microsoft.Rest.ClientRuntime" version="2.3.19" targetFramework="net462" />
  <package id="Microsoft.Rest.ClientRuntime.Azure" version="3.3.16" targetFramework="net462" />
  <package id="Microsoft.Rest.ClientRuntime.Azure.Authentication" version="2.4.0" targetFramework="net462" />
xseeseesee commented 4 years ago

@TimLovellSmith In your case, it's more like timeout issue, please refer this link for a ongoing discussion about TaskCanceledException if it helps

xseeseesee commented 4 years ago

@TimLovellSmith As this issue persists on your side, could you please provide more detailed logs so we could know what happened inside during the operations running. And I noticed your app still runs on old SDK versions. Have you got chance to try upgrade with latest Fluent SDK to see whether it persists? Thanks.

TimLovellSmith commented 4 years ago

I think you are right that this is caused by the behavior of HttpClient. I would suggest you do one of two things:

  1. Close as won't fix, please get them to fix HttpClient instead, or:
  2. Bake the workaround of checking for actual token cancellation into Fluent SDK, so that people can get a more natural experience like TimeoutException from Fluent SDK, and not have to face this confusing behavior of HttpClient. Despite this not always matching the behavior of HttpClient, at least it would allow you to design your own API's behavior nicely, and follow the spirit of guidelines here

"Only the requesting object can issue the cancellation request, and each listener is responsible for noticing the request and responding to it in an appropriate and timely manner."

"Requesting is distinct from listening. An object that invokes a cancelable operation can control when (if ever) cancellation is requested."

https://docs.microsoft.com/en-us/dotnet/standard/threading/cancellation-in-managed-threads?redirectedfrom=MSDN

=> (bottom line, if I didn't cancel the task, it shouldn't be canceled!)

(Community, if you agree TimeoutException is better please upvote issue, if you think TaskCanceledExceptions are great and should be expected all the time even when you didn't request them, please downvote...)

qub1n commented 4 years ago

I agree with @TimLovellSmith.

I need to distinguis if call was Cancelled or TimeOut because:

  1. I write the app logs messages precisely. It means exactly specified what happened (action was cancelled or action was timeout)
  2. If user cancel the action in UI, I don't want to show messageBox that it was cancelled. On the other way if action executed by user and timeouted, I need to tell the user that the action Timeouted.

I don't want to check on thousand places if cancellation token was cancelled or not. Why we have TimeoutException then?

When analysing logs, I am not too much interested in TaskCancelledExceptions, because in web appliction TaskCancelledExceptions are quite common as the WebRequest can be can cancelled by hitting F5 (refresh) or Esc in browser. On the other way I am quite interested in TimeOutException, because it means that the system does not work properly.

TimeOutExceptions are already used in another parts of .NET Framework, so it is wiselly to keep consistency.

If a library use dependecy with wrong design, it is not good to leak this wrong design out of the API but wrap it or find a workarround.

xseeseesee commented 4 years ago

@TimLovellSmith @qub1n Here are two options for you to set in your code and test if you still face the same issue.