fsprojects / pulsar-client-dotnet

Apache Pulsar native client for .NET (C#/F#/VB)
MIT License
301 stars 47 forks source link

A task was canceled - Reconnect error at ExchangeClientCredentials #231

Closed danny-krueger closed 1 year ago

danny-krueger commented 1 year ago

Hello all,

unfortunately, from time to time we receive the following error

The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing., A task was canceled., A task was canceled.

at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at <StartupCode$Pulsar-Client>.$TokenClient.ExchangeClientCredentials@47.MoveNext()
   at <StartupCode$Pulsar-Client>.$AuthenticationOauth2.newToken@100.MoveNext()
   at Pulsar.Client.Auth.OAuth2.AuthenticationOauth2.GetAuthData()
   at <StartupCode$Pulsar-Client>.$ConnectionPool.clo@136-31.MoveNext()
   at <StartupCode$Pulsar-Client>.$ConnectionPool.clo@130-30.MoveNext()
   at <StartupCode$Pulsar-Client>.$BinaryLookupService.FindBroker@61.MoveNext()
   at <StartupCode$Pulsar-Client>.$ConnectionHandler.-ctor@48-37.MoveNext(),
   at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)

"exceptions": [
      {
        "Depth": 0,
        "ClassName": "System.Threading.Tasks.TaskCanceledException",
        "Message": "The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.",
        "Source": "System.Net.Http",
        "StackTraceString": "   at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)\n   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)\n   at <StartupCode$Pulsar-Client>.$TokenClient.ExchangeClientCredentials@47.MoveNext()\n   at <StartupCode$Pulsar-Client>.$AuthenticationOauth2.newToken@100.MoveNext()\n   at Pulsar.Client.Auth.OAuth2.AuthenticationOauth2.GetAuthData()\n   at <StartupCode$Pulsar-Client>.$ConnectionPool.clo@136-31.MoveNext()\n   at <StartupCode$Pulsar-Client>.$ConnectionPool.clo@130-30.MoveNext()\n   at <StartupCode$Pulsar-Client>.$BinaryLookupService.FindBroker@61.MoveNext()\n   at <StartupCode$Pulsar-Client>.$ConnectionHandler.-ctor@48-37.MoveNext()",
        "RemoteStackTraceString": null,
        "RemoteStackIndex": 0,
        "HResult": -2146233029,
        "HelpURL": null
      },
      {
        "Depth": 1,
        "ClassName": "System.TimeoutException",
        "Message": "A task was canceled.",
        "Source": null,
        "StackTraceString": null,
        "RemoteStackTraceString": null,
        "RemoteStackIndex": 0,
        "HResult": -2146233083,
        "HelpURL": null
      },
      {
        "Depth": 2,
        "ClassName": "System.Threading.Tasks.TaskCanceledException",
        "Message": "A task was canceled.",
        "Source": "System.Private.CoreLib",
        "StackTraceString": "   at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)\n   at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)\n   at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\n   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)",
        "RemoteStackTraceString": null,
        "RemoteStackIndex": 0,
        "HResult": -2146233029,
        "HelpURL": null
      }
    ]

Unfortunately, we are unable to reproduce the error consistently. It occurs intermittently, typically when there is a network problem or the authentication server is unavailable. Most clients (both consumers and producers) are able to reconnect automatically, as expected. However, some clients remain stuck in the error message mentioned above, resulting in an infinite loop. In such cases, it becomes difficult to identify that the consumer, for example, is no longer able to establish a connection. Moreover, the error only generates a warning. It occurs every minute because there is a one-minute delay before each retry attempt. Unfortunately, authentication always fails during these attempts.

Screenshot 2023-09-19 at 14 47 12

Regarding the issue, we suspect that either the HttpClient is being disposed or the task is being completed, causing the call to the OAuth server to be canceled directly. However, our knowledge of F# is limited, so we would appreciate your assistance in identifying and resolving the problem. You can find the relevant code snippet at this location: GitHub link.

Currently, the only workaround is to restart the entire Kubernetes pod, after which everything returns to normal.

We will continue our efforts to reproduce and resolve the issue. Once we have more information, we will provide updates. If you require any further assistance, please let us know.

Lanayx commented 1 year ago

@danny-krueger Hi, thank you for reporting, could you please share the .NET version you use

Lanayx commented 1 year ago

Anyway, I've commited the change (using IHttpClientFactory), if you could verify this somehow before I published a Nuget package, this will be helpful

danny-krueger commented 1 year ago

Hi @Lanayx, thank you for the prompt response. We frequently use .NET 6.0, but occasionally we also use .NET 7.0.

We will try it out, but we need some time to test it.

danny-krueger commented 1 year ago

We have now intensively tested the new variant with IHttpClientFactory. After our testing last weekend, everything looks very good. Can you please create a release from this and provide the new version?

Lanayx commented 1 year ago

Ready https://www.nuget.org/packages/Pulsar.Client/2.13.0

danny-krueger commented 1 year ago

Thank you. I am closing the issue as we believe it has been resolved. If it turns out that it is not fixed, we will reopen the ticket.