dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

Experiencing intermittent "The server returned an invalid or unrecognized response" #27953

Closed edavedian closed 4 years ago

edavedian commented 5 years ago

We are using ASP.NET Core 2.1 deployed on Windows Azure App Service which is intermittently experiencing The server returned an invalid or unrecognized response when using GET and POST. This issue had been happening for a while but since it wasn't reported by any of consuming clients we recently noticed it when browsing the logs. This application communicates with several downstream services and this seems to occur randomly with all of them. The downstream services use Azure App Gateway, Apache HTTPD 2.4.6 and AWS ALB as front end servers. The issue occurs with both RESTful calls and SOAP based calls using WCF. I have sought help from Microsoft support and it is very difficult to get any kind of System.Net based trace in this environment. I also tried the Crash Diagnoser extension and set a first time exception trigger for either HttpRequestException or WinHttpException but somehow the crash dump wasn't generated. It is also very difficult to repro the issue and I have tried everything to repro it but have not been successful.

Here are the stacktraces:

System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.IO.IOException: The server returned an invalid or unrecognized response.
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithNtConnectionAuthAsync(HttpConnection connection, HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Aggregator.Service.Services.HTTP.CustomLoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\HTTP\CustomLoggingHttpMessageHandler.cs:line 66
   at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Polly.Timeout.TimeoutEngine.ImplementationAsync[TResult](Func`3 action, Context context, Func`2 timeoutProvider, TimeoutStrategy timeoutStrategy, Func`4 onTimeoutAsync, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
   at Polly.Timeout.TimeoutEngine.ImplementationAsync[TResult](Func`3 action, Context context, Func`2 timeoutProvider, TimeoutStrategy timeoutStrategy, Func`4 onTimeoutAsync, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
   at Microsoft.Extensions.Http.PolicyHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Aggregator.Service.Services.Tokens.TokenService.SendRequestAsync(String url, String clientId, String clientSecret, Dictionary`2 formData, CustomClaims customClaims, String requestId) in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\Tokens\TokenService.cs:line 275
   at Aggregator.Service.Services.Tokens.NslTokenService.CreateTokenAsync(GrantTypeEnum grantTypeEnum, String token, String scopes, Boolean validateLocally) in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\Tokens\NslTokenService.cs:line 119
   at Aggregator.Service.Services.Tokens.LlTokenService.CreateTokenAsync(String accessToken) in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\Tokens\LlTokenService.cs:line 103
   at Aggregator.Service.Services.LifeLock.LlService.GetLlAccessTokenAsync() in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\LifeLock\LlService.cs:line 157
System.ServiceModel.CommunicationException: An error occurred while sending the request. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.IO.IOException: The server returned an invalid or unrecognized response.
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.AuthenticationHelper.SendWithNtAuthAsync(HttpRequestMessage request, Uri authUri, ICredentials credentials, Boolean isProxyAuth, HttpConnection connection, HttpConnectionPool connectionPool, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithNtConnectionAuthAsync(HttpConnection connection, HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at System.ServiceModel.Channels.HttpChannelFactory`1.HttpClientRequestChannel.HttpClientChannelAsyncRequest.SendRequestAsync(Message message, TimeoutHelper timeoutHelper)
   --- End of inner exception stack trace ---
   at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
   at System.ServiceModel.Channels.ServiceChannel.SendAsyncResult.End(SendAsyncResult result)
   at System.ServiceModel.Channels.ServiceChannel.EndCall(String action, Object[] outs, IAsyncResult result)
   at System.ServiceModel.Channels.ServiceChannelProxy.TaskCreator.<>c__DisplayClass1_0.<CreateGenericTask>b__0(IAsyncResult asyncResult)
--- End of stack trace from previous location where exception was thrown ---
   at Aggregator.Service.Common.Helpers.MessageHelper.SendRequestAsync[T,T2](ILogger logger, Int32 attempts, Int32 maxRetries, Int32 retrySleepTime, Func`2 method, T param, String caller, Int32 line) in C:\jenkins\workspace\NIS\Builds

Just to see if there is a difference, I set DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER to false to use the WinHttpHandler instead of SocketsHttpHandler and the issue still occurred but with a slightly different stacktrace:

System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.Http.WinHttpException: Error 12152 calling WINHTTP_CALLBACK_STATUS_REQUEST_ERROR, 'The server returned an invalid or unrecognized response'.
   at System.Threading.Tasks.RendezvousAwaitable`1.GetResult()
   at System.Net.Http.WinHttpHandler.StartRequest(WinHttpRequestState state)
   --- End of inner exception stack trace ---
   at Aggregator.Service.Services.HTTP.CustomLoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\HTTP\CustomLoggingHttpMessageHandler.cs:line 66
   at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Polly.Timeout.TimeoutEngine.ImplementationAsync[TResult](Func`3 action, Context context, Func`2 timeoutProvider, TimeoutStrategy timeoutStrategy, Func`4 onTimeoutAsync, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
   at Polly.Timeout.TimeoutEngine.ImplementationAsync[TResult](Func`3 action, Context context, Func`2 timeoutProvider, TimeoutStrategy timeoutStrategy, Func`4 onTimeoutAsync, CancellationToken cancellationToken, Boolean continueOnCapturedContext)
   at Microsoft.Extensions.Http.PolicyHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Aggregator.Service.Services.Tokens.TokenService.SendRequestAsync(String url, String clientId, String clientSecret, Dictionary`2 formData, CustomClaims customClaims, String requestId) in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\Tokens\TokenService.cs:line 275
   at Aggregator.Service.Services.Tokens.NslTokenService.CreateTokenAsync(GrantTypeEnum grantTypeEnum, String token, String scopes, Boolean validateLocally) in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\Tokens\NslTokenService.cs:line 119
   at Aggregator.Service.Services.Tokens.LlTokenService.CreateTokenAsync(String accessToken) in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\Tokens\LlTokenService.cs:line 103
   at Aggregator.Service.Services.LifeLock.LlService.GetLlAccessTokenAsync() in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Services\LifeLock\LlService.cs:line 157
System.ServiceModel.CommunicationException: An error occurred while sending the request. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.Http.WinHttpException: Error 12152 calling WINHTTP_CALLBACK_STATUS_REQUEST_ERROR, 'The server returned an invalid or unrecognized response'.
   at System.Threading.Tasks.RendezvousAwaitable`1.GetResult()
   at System.Net.Http.WinHttpHandler.StartRequest(WinHttpRequestState state)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at System.ServiceModel.Channels.HttpChannelFactory`1.HttpClientRequestChannel.HttpClientChannelAsyncRequest.SendRequestAsync(Message message, TimeoutHelper timeoutHelper)
   --- End of inner exception stack trace ---
   at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
   at System.ServiceModel.Channels.ServiceChannel.SendAsyncResult.End(SendAsyncResult result)
   at System.ServiceModel.Channels.ServiceChannel.EndCall(String action, Object[] outs, IAsyncResult result)
   at System.ServiceModel.Channels.ServiceChannelProxy.TaskCreator.<>c__DisplayClass1_0.<CreateGenericTask>b__0(IAsyncResult asyncResult)
--- End of stack trace from previous location where exception was thrown ---
   at Aggregator.Service.Common.Helpers.MessageHelper.SendRequestAsync[T,T2](ILogger logger, Int32 attempts, Int32 maxRetries, Int32 retrySleepTime, Func`2 method, T param, String caller, Int32 line) in C:\jenkins\workspace\NIS\Builds\DSP\NAG\NAG.2018.4.1\Src\Aggregator.Service.Common\Helpers\MessageHelper.cs:line 0

Your help is greatly appreciated as this is currently a production issue.

davidsh commented 5 years ago

@edavedian Thank you for the bug report.

This is very interesting because you are getting the same error "The server returned an invalid or unrecognized response" with two different HTTP stack implementations. Unfortunately, that error message is not descriptive enough to determine the problem.

In order to diagnose this further, we will need additional logs. Can you get a Wireshark trace and/or System.Net logs (.NET Core logs using PerfView)?

For System.Net .NET Core tracing, see this:https://github.com/dotnet/corefx/blob/master/Documentation/debugging/windows-instructions.md#using-perfview

I have sought help from Microsoft support and it is very difficult to get any kind of System.Net based trace in this environment.

Do you currently have an active support case with Microsoft Product Support? If so, you should continue to use that mechanism as well. Please have the support engineer reach out to our team for further coordination.

cc: @karelz

edavedian commented 5 years ago

@davidsh Thank you very much. I will ask the Microsoft support engineer to reach out to your team. Since all the communication is over SSL, would a wireshark trace help? I originally thought that this might be the symptom of a random TCP reset but I was able to generate socket resets using tcpkill and the exception was completely different. I will ask the support engineer to see if they can get System.NET tracing using the page you provided. Any idea why I can't capture a crash dump using procdump with arguments -e 1 -f WinHttpException or -e 1 -f HttpRequestException and will that be any helpful if I am able to capture a crash dump?

edavedian commented 5 years ago

@davidsh I have to correct myself as this is occurring against 2 types of servers in INT environment and only 1 type of server in PROD environment which leads be to believe that this is a server related issue. We will just build a retry logic around this particular exception.

JuntaoZhu commented 5 years ago

I've written a post, regarding the troubleshooting I have done with Edwin, at Case study - Intermittent “The server returned an invalid or unrecognized response.” error

mcasperson commented 5 years ago

@JuntaoZhu We are experiencing a similar issue, and the link to the blog post returns a 404. Is that blog post available elsewhere?

JuntaoZhu commented 5 years ago

Sorry for the inconvenience. I've fixed it. Please try again. It should be working now. Thanks for reporting it to me.

mcasperson commented 5 years ago

Thankyou for that blog post @JuntaoZhu, it was helpful.

nhenneaux commented 5 years ago

We are facing exactly the same error in production environment every 10 minutes. There is a schedule job doing several HTTP requests every 15 seconds. Only the first one fails. When retried, it works on the first retry and all the subsequent HTTP are succeeding. The same service is targeted using a Java Jetty HTTP client without any issue. The targeted service is running with latest Jetty HTTP server on Linux REHL 7.

I'm not sure it's helping you but I do think there is something strange in the underlying HTTP (or TCP) client.

rui-ktei commented 5 years ago

Hi guys, just want to mention that, this is currently affecting our production. @davidsh is there going to be a fix in versions 2.x? Thank you very much.

davidsh commented 5 years ago

@rui-ktei

Hi guys, just want to mention that, this is currently affecting our production. @davidsh is there going to be a fix in versions 2.x? Thank you very much.

Is there a specific PR that was already fixed in master that you would like to be ported to release/2.1 servicing?

It's not clear your problem is related to this issue. In fact, this issue itself was closed because the original reporter of the issue determined it was a server issue and not a bug in CoreFx.

If you have a repro for your error, please open a new issue with enough information so that it can be diagnosed to determine a possible root cause (perhaps already fixed in .NET Core 3.0 or perhaps something unique).