dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.04k stars 4.68k forks source link

"Handle is already used by another Socket." causing 502 errors (NET Core 2.2., NET 5.0, Kestrel, CentOS, Nginx) #56750

Closed afigelj closed 2 years ago

afigelj commented 3 years ago

We encountered random 502 errors with Net Core 2.2. web application running on CentOS behind Nginx. Thinking the issue is related to version 2.2., we upgraded the application to NET 5.0. The errors still happen random, just in more irregular intervals.

Suggested solutions on the web for 502 errors also suggested, the error could be related to longer loading times of data from database. We optimized all queries in the API using the analysis of execution plans. The errors keep appearing. While examining the Access and Kestrel logs, we found one irregularity. Below are the Kestrel logs of two identical web page requests less than 2 minutes apart. First request returned status 200, second 502. The difference is after the line xxx.DfaMatcher[1005]. The 1st log shows the line Endpoint … is valid for the request path, but the second log is missing this entry.

As the second log shows, there were 14 seconds where Kestrel didn’t process anything (despite the user did reload the web page 4 times). The first entry is Starting HttpMessageHandler cleanup cycle. After that, the 4 web page requests produced 502, which Kestrel shows as: Connection id "xxx" communication error.

We were also surprised to see the API log, where the 4 subsequent requests from the web page (producing 502) were processed by the API with status 200. As a side note: we never encountered any similar issue on the development computer (Win 10 LTSB, Visual Studio 2019 Pro, using IIS) for the past two years.

Access.log: Response 200 213.157.249.198 - - [16/Jun/2021:08:17:18 +0200] "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1" 200 8025 "https://my-domain.com/Admin/Measurement" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"

Response 502 213.157.249.198 - - [16/Jun/2021:08:18:55 +0200] "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1" 502 552 "https://my-domain.com/Admin/Measurement" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"

Kestrel log: Response 200 Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK15" accepted. Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK15" started. Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware[3] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection "0HM9GAMVBTK15" established using the following protocol: Tls12 Jun 16 08:17:18 .NET Core (dotnet)[551]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Jun 16 08:17:18 .NET Core (dotnet)[551]: Request starting HTTP/1.1 GET https://my-domain.com/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 - - Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware[4] Jun 16 08:17:18 .NET Core (dotnet)[551]: The request path /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 does not match a supported file type Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001] Jun 16 08:17:18 .NET Core (dotnet)[551]: 1 candidate(s) found for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62' Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005] Jun 16 08:17:18 .NET Core (dotnet)[551]: Endpoint 'Page: /Measurement/Detail' with route pattern 'Admin/Measurement/Detail/{id:Guid?}/{handler?}' is valid for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62'

Response 502

Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39]
Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1R" accepted.
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1]
Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1R" started.
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware[3]
Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection "0HM9GAMVBTK1R" established using the following protocol: Tls12
Jun 16 08:18:55 .NET Core (dotnet)[551]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
Jun 16 08:18:55 .NET Core (dotnet)[551]: Request starting HTTP/1.1 GET https://my-domain.com/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 - -
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware[4]
Jun 16 08:18:55 .NET Core (dotnet)[551]: The request path /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 does not match a supported file type
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
Jun 16 08:18:55 .NET Core (dotnet)[551]: 1 candidate(s) found for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62'
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]

- Missing line: Endpoint 'xxx' is valid for the request path

Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[100]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Starting HttpMessageHandler cleanup cycle with 7 items
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[101]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Ending HttpMessageHandler cleanup cycle after 0.0043ms - processed: 0 items - remaining: 7 items
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[14]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" communication error.
Jun 16 08:19:09 .NET Core (dotnet)[551]: System.InvalidOperationException: Handle is already used by another Socket.
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterCore(IntPtr socketHandle, SocketAsyncContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterSocket(IntPtr socketHandle, SocketAsyncContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.Register()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.StartAsyncOperation(SocketAsyncContext context, TOperation operation, Int32 observedSequenceNumber, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.ReceiveAsync(Memory`1 buffer, SocketFlags flags, Int32& bytesReceived, Action`5 callback, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEventArgs.DoOperationReceive(SafeSocketHandle handle, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.Socket.ReceiveAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver.WaitForDataAsync()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessReceives()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive()
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" accepted.
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" started.
Jun 16 08:19:09 .NET Core (dotnet)[551]: fail: Microsoft.AspNetCore.Server.Kestrel[0]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Unhandled exception while processing 0HM9GAMVBTK1T.
Jun 16 08:19:09 .NET Core (dotnet)[551]: System.InvalidOperationException: Handle is already used by another Socket.
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterCore(IntPtr socketHandle, SocketAsyncContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterSocket(IntPtr socketHandle, SocketAsyncContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.Register()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.StartAsyncOperation(SocketAsyncContext context, TOperation operation, Int32 observedSequenceNumber, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.ReceiveAsync(Memory`1 buffer, SocketFlags flags, Int32& bytesReceived, Action`5 callback, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEventArgs.DoOperationReceive(SafeSocketHandle handle, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.Socket.ReceiveAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver.WaitForDataAsync()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessReceives()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.ReadAsync(CancellationToken token)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.DefaultPipeReader.ReadAsync(CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.DuplexPipeStream.ReadAsyncInternal(Memory`1 destination, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.<FillHandshakeBufferAsync>g__InternalFillHandshakeBufferAsync|182_0[TIOAdapter](TIOAdapter adap, ValueTask`1 task, Int32 minSize)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware.OnConnectionAsync(ConnectionContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1.ExecuteAsync()
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" stopped.
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" sending FIN because: "The Socket transport's send loop completed gracefully."

Nginx error log: 2021/06/16 08:18:55 [error] 1883#0: *3243 upstream prematurely closed connection while reading response header from upstream, client: 213.157.249.198, server: my-domain.com, request: "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1", upstream: "https://[::1]:6000/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62", host: "my-domain.com", referrer: "https://my-domain.com/Admin/Measurement/Email/5aa3ce5c-612b-4202-896a-f03d20af7d62"

2021/06/16 08:18:55 [error] 1883#0: *3243 SSL_do_handshake() failed (SSL: error:140940F5:SSL routines:ssl3_read_bytes:unexpected record) while SSL handshaking to upstream, client: 213.157.249.198, server: my-domain.com, request: "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1", upstream: "https://127.0.0.1:6000/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62", host: "my-domain.com", referrer: "https://my-domain.com/Admin/Measurement/Email/5aa3ce5c-612b-4202-896a-f03d20af7d62"

Any ideas where to look for the cause of the issue would be appreciated.

adityamandaleeka commented 3 years ago

@dotnet/http Could this be a bug in Sockets?

karelz commented 3 years ago

Exception comes from here: https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs#L116 or the 5.0 version: https://github.com/dotnet/runtime/blob/release/5.0/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs#L115

                // Using public SafeSocketHandle(IntPtr) a user can add the same handle
                // from a different Socket instance.

Is it something ASP.NET might use? @antonfirsov @geoffkizer any further thoughts?

stephentoub commented 3 years ago

Is it something ASP.NET might use?

https://github.com/dotnet/aspnetcore/blob/18a926850f7374248e687ee64390e7f10514403f/src/Servers/Kestrel/Transport.Sockets/src/SocketTransportOptions.cs#L110

stephentoub commented 3 years ago

As an aside, I don't understand the explanation here: https://github.com/dotnet/aspnetcore/blob/18a926850f7374248e687ee64390e7f10514403f/src/Servers/Kestrel/Transport.Sockets/src/SocketTransportOptions.cs#L100-L108 How is this not setting up a potential double-free handle/fd problem?

halter73 commented 3 years ago

Is it something ASP.NET might use?

We use it only for the listen socket to support systemd socket activation. @afigelj Are you calling .UseSystemd() or .ListenHandle() anywhere? I'm guessing not. In any case, that should only affect the listen socket, right? This is an accepted socket.

How is this not setting up a potential double-free handle/fd problem?

This does seem like a problem. Even for the systemd socket activation, the manpage specifically says

It should not invoke shutdown(2) on sockets it got with Accept=no,

https://www.freedesktop.org/software/systemd/man/systemd.socket.html#Accept=

Accept=no is the only scenario we support 😢. In context, that might only apply to AF_UNIX sockets (Unix domain sockets) which aren't super common even in systemd socket activation scenarios. So maybe that's why we haven't had problems reported related to this. There's also the possibility of people calling KestrelServerOptions.ListenHandle() directly that don't expect us to close the listen handle on shutdown.

I didn't like this the last time it was touched either, but we'd need to rework our accept loop logic or transports themselves to use CancellationTokens to stop the loop rather than relying on closing the listen socket to do so. This is probably something we should do anyway, but I'm not aware of anyone filing issues related to this, so for now it's mostly a code correctness thing.

afigelj commented 3 years ago

We use it only for the listen socket to support systemd socket activation. @afigelj Are you calling .UseSystemd() or .ListenHandle() anywhere? I'm guessing not. In any case, that should only affect the listen socket, right? This is an accepted socket.

No, we're not using any of this methods. Simply service/API/repository calls from the web page.

antonfirsov commented 3 years ago

I'm not super-familiar with memory models, but shouldn't be the read of _asyncEngine volatile here to ensure it's always happening after the lock is taken? https://github.com/dotnet/runtime/blob/a036594cb3766f70db9f5c6e7681454e6894da50/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs#L1235-L1239

stephentoub commented 3 years ago

shouldn't be the read of _asyncEngine volatile here to ensure it's always happening after the lock is taken?

Entering the lock has read-acquire semantics. No reads from inside the lock can move to be before the lock.

antonfirsov commented 3 years ago

Other (weird...) cases where people seem to hit this:

https://stackoverflow.com/questions/67912596/warn-microsoft-aspnetcore-server-kestrel0-connection-processing-ended-abnorma https://zblesk.net/blog/aspnetcore-identity-litedb-breaks-on-ubuntu/

halter73 commented 3 years ago

@antonfirsov @stephentoub Are you both okay with transferring this to the runtime repo since there doesn't appear to be any usage of SafeSocketHandle(IntPtr) in the app?

ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
We encountered random 502 errors with Net Core 2.2. web application running on CentOS behind Nginx. Thinking the issue is related to version 2.2., we upgraded the application to NET 5.0. The errors still happen random, just in more irregular intervals. Suggested solutions on the web for 502 errors also suggested, the error could be related to longer loading times of data from database. We optimized all queries in the API using the analysis of execution plans. The errors keep appearing. While examining the Access and Kestrel logs, we found one irregularity. Below are the Kestrel logs of two identical web page requests less than 2 minutes apart. First request returned status 200, second 502. The difference is after the line **xxx.DfaMatcher[1005]**. The 1st log shows the line **Endpoint … is valid for the request path**, but the second log is missing this entry. As the second log shows, there were 14 seconds where Kestrel didn’t process anything (despite the user did reload the web page 4 times). The first entry is **Starting HttpMessageHandler cleanup cycle**. After that, the 4 web page requests produced 502, which Kestrel shows as: **Connection id "xxx" communication error**. We were also surprised to see the API log, where the 4 subsequent requests from the web page (producing 502) were processed by the API with status 200. As a side note: we never encountered any similar issue on the development computer (Win 10 LTSB, Visual Studio 2019 Pro, using IIS) for the past two years. **Access.log:** **Response 200** 213.157.249.198 - - [16/Jun/2021:08:17:18 +0200] "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1" 200 8025 "https://my-domain.com/Admin/Measurement" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36" **Response 502** 213.157.249.198 - - [16/Jun/2021:08:18:55 +0200] "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1" 502 552 "https://my-domain.com/Admin/Measurement" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36" **Kestrel log: Response 200** Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK15" accepted. Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK15" started. Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware[3] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection "0HM9GAMVBTK15" established using the following protocol: Tls12 Jun 16 08:17:18 .NET Core (dotnet)[551]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Jun 16 08:17:18 .NET Core (dotnet)[551]: Request starting HTTP/1.1 GET https://my-domain.com/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 - - Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware[4] Jun 16 08:17:18 .NET Core (dotnet)[551]: The request path /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 does not match a supported file type Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001] Jun 16 08:17:18 .NET Core (dotnet)[551]: 1 candidate(s) found for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62' Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005] **Jun 16 08:17:18 .NET Core (dotnet)[551]: Endpoint 'Page: /Measurement/Detail' with route pattern 'Admin/Measurement/Detail/{id:Guid?}/{handler?}' is valid for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62'** **Response 502** ``` Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39] Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1R" accepted. Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1] Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1R" started. Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware[3] Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection "0HM9GAMVBTK1R" established using the following protocol: Tls12 Jun 16 08:18:55 .NET Core (dotnet)[551]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Jun 16 08:18:55 .NET Core (dotnet)[551]: Request starting HTTP/1.1 GET https://my-domain.com/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 - - Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware[4] Jun 16 08:18:55 .NET Core (dotnet)[551]: The request path /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 does not match a supported file type Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001] Jun 16 08:18:55 .NET Core (dotnet)[551]: 1 candidate(s) found for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62' Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005] ``` **- Missing line: Endpoint 'xxx' is valid for the request path** ``` Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[100] Jun 16 08:19:09 .NET Core (dotnet)[551]: Starting HttpMessageHandler cleanup cycle with 7 items Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[101] Jun 16 08:19:09 .NET Core (dotnet)[551]: Ending HttpMessageHandler cleanup cycle after 0.0043ms - processed: 0 items - remaining: 7 items Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[14] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" communication error. Jun 16 08:19:09 .NET Core (dotnet)[551]: System.InvalidOperationException: Handle is already used by another Socket. Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterCore(IntPtr socketHandle, SocketAsyncContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterSocket(IntPtr socketHandle, SocketAsyncContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.Register() Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.StartAsyncOperation(SocketAsyncContext context, TOperation operation, Int32 observedSequenceNumber, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.ReceiveAsync(Memory`1 buffer, SocketFlags flags, Int32& bytesReceived, Action`5 callback, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEventArgs.DoOperationReceive(SafeSocketHandle handle, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.Socket.ReceiveAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver.WaitForDataAsync() Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessReceives() Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive() Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" accepted. Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" started. Jun 16 08:19:09 .NET Core (dotnet)[551]: fail: Microsoft.AspNetCore.Server.Kestrel[0] Jun 16 08:19:09 .NET Core (dotnet)[551]: Unhandled exception while processing 0HM9GAMVBTK1T. Jun 16 08:19:09 .NET Core (dotnet)[551]: System.InvalidOperationException: Handle is already used by another Socket. Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterCore(IntPtr socketHandle, SocketAsyncContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterSocket(IntPtr socketHandle, SocketAsyncContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.Register() Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.StartAsyncOperation(SocketAsyncContext context, TOperation operation, Int32 observedSequenceNumber, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.ReceiveAsync(Memory`1 buffer, SocketFlags flags, Int32& bytesReceived, Action`5 callback, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEventArgs.DoOperationReceive(SafeSocketHandle handle, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.Socket.ReceiveAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver.WaitForDataAsync() Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessReceives() Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive() Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.PipeCompletion.ThrowLatchedException() Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.ReadAsync(CancellationToken token) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.DefaultPipeReader.ReadAsync(CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.DuplexPipeStream.ReadAsyncInternal(Memory`1 destination, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.g__InternalFillHandshakeBufferAsync|182_0[TIOAdapter](TIOAdapter adap, ValueTask`1 task, Int32 minSize) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware.OnConnectionAsync(ConnectionContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1.ExecuteAsync() Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[2] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" stopped. Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" sending FIN because: "The Socket transport's send loop completed gracefully." ``` **Nginx error log:** 2021/06/16 08:18:55 [error] 1883#0: *3243 upstream prematurely closed connection while reading response header from upstream, client: 213.157.249.198, server: my-domain.com, request: "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1", upstream: "https://[::1]:6000/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62", host: "my-domain.com", referrer: "https://my-domain.com/Admin/Measurement/Email/5aa3ce5c-612b-4202-896a-f03d20af7d62" 2021/06/16 08:18:55 [error] 1883#0: *3243 SSL_do_handshake() failed (SSL: error:140940F5:SSL routines:ssl3_read_bytes:unexpected record) while SSL handshaking to upstream, client: 213.157.249.198, server: my-domain.com, request: "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1", upstream: "https://127.0.0.1:6000/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62", host: "my-domain.com", referrer: "https://my-domain.com/Admin/Measurement/Email/5aa3ce5c-612b-4202-896a-f03d20af7d62" Any ideas where to look for the cause of the issue would be appreciated.
Author: afigelj
Assignees: halter73
Labels: `area-System.Net`, `untriaged`
Milestone: -
ManickaP commented 3 years ago

Triage: since this has been since 2.2, it's not a regression. We should investigate whether it's ours issue or not for 7.0.

ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
We encountered random 502 errors with Net Core 2.2. web application running on CentOS behind Nginx. Thinking the issue is related to version 2.2., we upgraded the application to NET 5.0. The errors still happen random, just in more irregular intervals. Suggested solutions on the web for 502 errors also suggested, the error could be related to longer loading times of data from database. We optimized all queries in the API using the analysis of execution plans. The errors keep appearing. While examining the Access and Kestrel logs, we found one irregularity. Below are the Kestrel logs of two identical web page requests less than 2 minutes apart. First request returned status 200, second 502. The difference is after the line **xxx.DfaMatcher[1005]**. The 1st log shows the line **Endpoint … is valid for the request path**, but the second log is missing this entry. As the second log shows, there were 14 seconds where Kestrel didn’t process anything (despite the user did reload the web page 4 times). The first entry is **Starting HttpMessageHandler cleanup cycle**. After that, the 4 web page requests produced 502, which Kestrel shows as: **Connection id "xxx" communication error**. We were also surprised to see the API log, where the 4 subsequent requests from the web page (producing 502) were processed by the API with status 200. As a side note: we never encountered any similar issue on the development computer (Win 10 LTSB, Visual Studio 2019 Pro, using IIS) for the past two years. **Access.log:** **Response 200** 213.157.249.198 - - [16/Jun/2021:08:17:18 +0200] "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1" 200 8025 "https://my-domain.com/Admin/Measurement" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36" **Response 502** 213.157.249.198 - - [16/Jun/2021:08:18:55 +0200] "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1" 502 552 "https://my-domain.com/Admin/Measurement" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36" **Kestrel log: Response 200** Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK15" accepted. Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK15" started. Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware[3] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection "0HM9GAMVBTK15" established using the following protocol: Tls12 Jun 16 08:17:18 .NET Core (dotnet)[551]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Jun 16 08:17:18 .NET Core (dotnet)[551]: Request starting HTTP/1.1 GET https://my-domain.com/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 - - Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware[4] Jun 16 08:17:18 .NET Core (dotnet)[551]: The request path /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 does not match a supported file type Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001] Jun 16 08:17:18 .NET Core (dotnet)[551]: 1 candidate(s) found for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62' Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005] **Jun 16 08:17:18 .NET Core (dotnet)[551]: Endpoint 'Page: /Measurement/Detail' with route pattern 'Admin/Measurement/Detail/{id:Guid?}/{handler?}' is valid for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62'** **Response 502** ``` Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39] Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1R" accepted. Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1] Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1R" started. Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware[3] Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection "0HM9GAMVBTK1R" established using the following protocol: Tls12 Jun 16 08:18:55 .NET Core (dotnet)[551]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Jun 16 08:18:55 .NET Core (dotnet)[551]: Request starting HTTP/1.1 GET https://my-domain.com/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 - - Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware[4] Jun 16 08:18:55 .NET Core (dotnet)[551]: The request path /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 does not match a supported file type Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001] Jun 16 08:18:55 .NET Core (dotnet)[551]: 1 candidate(s) found for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62' Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005] ``` **- Missing line: Endpoint 'xxx' is valid for the request path** ``` Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[100] Jun 16 08:19:09 .NET Core (dotnet)[551]: Starting HttpMessageHandler cleanup cycle with 7 items Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[101] Jun 16 08:19:09 .NET Core (dotnet)[551]: Ending HttpMessageHandler cleanup cycle after 0.0043ms - processed: 0 items - remaining: 7 items Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[14] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" communication error. Jun 16 08:19:09 .NET Core (dotnet)[551]: System.InvalidOperationException: Handle is already used by another Socket. Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterCore(IntPtr socketHandle, SocketAsyncContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterSocket(IntPtr socketHandle, SocketAsyncContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.Register() Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.StartAsyncOperation(SocketAsyncContext context, TOperation operation, Int32 observedSequenceNumber, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.ReceiveAsync(Memory`1 buffer, SocketFlags flags, Int32& bytesReceived, Action`5 callback, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEventArgs.DoOperationReceive(SafeSocketHandle handle, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.Socket.ReceiveAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver.WaitForDataAsync() Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessReceives() Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive() Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" accepted. Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" started. Jun 16 08:19:09 .NET Core (dotnet)[551]: fail: Microsoft.AspNetCore.Server.Kestrel[0] Jun 16 08:19:09 .NET Core (dotnet)[551]: Unhandled exception while processing 0HM9GAMVBTK1T. Jun 16 08:19:09 .NET Core (dotnet)[551]: System.InvalidOperationException: Handle is already used by another Socket. Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterCore(IntPtr socketHandle, SocketAsyncContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterSocket(IntPtr socketHandle, SocketAsyncContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.Register() Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.StartAsyncOperation(SocketAsyncContext context, TOperation operation, Int32 observedSequenceNumber, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.ReceiveAsync(Memory`1 buffer, SocketFlags flags, Int32& bytesReceived, Action`5 callback, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEventArgs.DoOperationReceive(SafeSocketHandle handle, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.Socket.ReceiveAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver.WaitForDataAsync() Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessReceives() Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive() Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.PipeCompletion.ThrowLatchedException() Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.ReadAsync(CancellationToken token) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.DefaultPipeReader.ReadAsync(CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.DuplexPipeStream.ReadAsyncInternal(Memory`1 destination, CancellationToken cancellationToken) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.g__InternalFillHandshakeBufferAsync|182_0[TIOAdapter](TIOAdapter adap, ValueTask`1 task, Int32 minSize) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter) Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware.OnConnectionAsync(ConnectionContext context) Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1.ExecuteAsync() Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[2] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" stopped. Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7] Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" sending FIN because: "The Socket transport's send loop completed gracefully." ``` **Nginx error log:** 2021/06/16 08:18:55 [error] 1883#0: *3243 upstream prematurely closed connection while reading response header from upstream, client: 213.157.249.198, server: my-domain.com, request: "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1", upstream: "https://[::1]:6000/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62", host: "my-domain.com", referrer: "https://my-domain.com/Admin/Measurement/Email/5aa3ce5c-612b-4202-896a-f03d20af7d62" 2021/06/16 08:18:55 [error] 1883#0: *3243 SSL_do_handshake() failed (SSL: error:140940F5:SSL routines:ssl3_read_bytes:unexpected record) while SSL handshaking to upstream, client: 213.157.249.198, server: my-domain.com, request: "GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1", upstream: "https://127.0.0.1:6000/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62", host: "my-domain.com", referrer: "https://my-domain.com/Admin/Measurement/Email/5aa3ce5c-612b-4202-896a-f03d20af7d62" Any ideas where to look for the cause of the issue would be appreciated.
Author: afigelj
Assignees: halter73
Labels: `area-System.Net.Sockets`
Milestone: 7.0.0
server-si commented 3 years ago

This issue is happening here, fresh logs and further information at the end.

This caught me by a surprise, especially seeing that a fix is not planned for years (??).

Is there a workaround? Perhaps a patch that we could apply and keep until a proper fix is mainlined? I wouldn't have a problem doing that, we have a build system in place for other things.

Or should we drop using linux servers for .NET web applications altogether? I'm really at a loss here. Help!

BTW, found another user, @zblesk, having the same issue on Ubuntu, but he didn't follow up: https://github.com/dotnet/aspnetcore/issues/30787

I'm wondering if this issue is not far more common then it appears to be.

For example, our customer's initial error report pointed at Nginx, as the 502 error was shown there. Users might be looking in the wrong direction or they might simply give up.

Anyway, kestrel error log attached, additional information:

[user@dotnet ~]# dotnet --info
.NET SDK (reflecting any global.json):
 Version:   5.0.401
 Commit:    4bef5f3dbf

Runtime Environment:
 OS Name:     centos
 OS Version:  7
 OS Platform: Linux
 RID:         centos.7-x64
 Base Path:   /usr/share/dotnet/sdk/5.0.401/

Host (useful for support):
  Version: 5.0.10
  Commit:  e1825b4928

kestrel_error_log.txt

karelz commented 3 years ago

@server-si it is not clear if it is .NET problem, or something else. We do not understand the root cause. I believe that we will need a repro, or someone willing to investigate with a repro at hand. Are you in position to reproduce the problem in your environment in some regular manner?

zblesk commented 3 years ago

@server-si eventually I found out it was caused by an external library. These people were investigating something similar and also mentioned this issue because they found the related blog post, linked above. I don't know why the library was breaking the pipeline, and only on linux; I won't be investigating it, either. Just stopped using it.

karelz commented 3 years ago

Thanks @zblesk for update -- just FYI, the Twitter thread you linked is @antonfirsov above from our (Networking) team. Also, summarizing your story here: You removed library AspNetCore.Identity.LiteDB and it didn't happen since then - you describe it in your blog post. Note: It does not mean necessarily that the library is to be blamed, it may have stumbled on a bug in .NET -- something we still would need to chase down and root cause.

antonfirsov commented 3 years ago

To reason about the root cause, we need to see the call stack and the program's state around the first handle registration that doesn't throw.

Ideally, we should be able to get this info by asking users to enabling tracing, but unfortunately we are not logging the handle value anywhere. We need to provide private builds with extended logging, or exchange debug info with customers in other ways. @server-si if you can reproduce this consistently, any chance you are willing to work with us privately?

server-si commented 2 years ago

Thanks for responding, everybody.

I'm afraid we still can't consistently reproduce the issue. We've been stressing the test server, mainly using Selenium IDE, but have so far been unable to trigger the error. The errors on the production server are seemingly random.

Since last week, the production server is down to a single error occurrence daily, if at all. The frequency of errors was significantly higher not so long ago.That's a welcome improvement but we don't fully understand what caused it nor can we be sure it is permanent.

Besides the usual moving parts such as the user interaction and the OS and the .NET updates, the application code also got a potentially significant change. I'm mainly dealing with infrastructure so I'll invite the application author to provide further details about that.

I would gladly run custom .NET builds and privately provide any other debugging information. The problem I see right now is that we can't willingly trigger the error, despite actively trying.

afigelj commented 2 years ago

I just want to inform everyone, that we're extensively testing the application for the past few days on the test server. We found some clues but they are not conclusive enough for now. I'll get back when we have some more info.

tmds commented 2 years ago

You removed library AspNetCore.Identity.LiteDB and it didn't happen since then

Based on what we found in https://github.com/dotnet/runtime/issues/64305, https://github.com/fabian-blum/AspNetCore.Identity.LiteDB/pull/14 will fix the issue in AspNetCore.Identity.LiteDB.

wfurt commented 2 years ago

It seems like this was external issue. We can re-open if there is evidence otherwise.