dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.52k stars 4.53k forks source link

SIGSEGV in msquic on Linux ARM32 #103404

Open MichalStrehovsky opened 2 weeks ago

MichalStrehovsky commented 2 weeks ago

We're often seeing sigsegv in the System.Net.Http.Functional.Tests on Linux ARM32 in native AOT testing. I couldn't find Linux ARM32 runs on top of CoreCLR so I don't know if we run it.

Most recently in https://dev.azure.com/dnceng-public/public/_build/results?buildId=706302&view=logs&jobId=a8f24b3c-c71a-5a83-5031-ad8ed12efa6f.

I pulled down the core file and managed to find the msquic transport package to get symbols. The crash is in msquic:

(lldb) bt
* thread #1, name = 'System.Net.Http', stop reason = signal SIGSEGV
  * frame #0: 0xef4f799c libmsquic.so.2`QuicSendCanSendStreamNow(Stream=<unavailable>) at send.c:956:1
    frame #1: 0xef4c7204 libmsquic.so.2`QuicConnProcessPeerTransportParameters at connection.c.clog.h.lttng.h:1253:1
    frame #2: 0xef4c71a8 libmsquic.so.2`QuicConnProcessPeerTransportParameters(Connection=0x00000000, FromResumptionTicket='\x80') at connection.c:2976:13

Grab the dump and test symbols with runfo get-helix-payload -j fa19deed-d149-4234-8c44-9e123af06c24 -w System.Net.Http.Functional.Tests -o c:\myhell. Grab the transport package from https://dnceng.visualstudio.com/public/_artifacts/feed/dotnet9-transport/NuGet/runtime.linux-arm.runtime.native.System.Net.MsQuic.Transport/overview/9.0.0-alpha.1.24167.3

Don't know if this should be in the native AOT or networking area path. I don't know if we do any regular testing on Linux-arm32 with CoreCLR (not musl-arm32, just arm32).

filipnavara commented 2 weeks ago

I remember checking this one in the past and it was not NativeAOT specific.

dotnet-policy-service[bot] commented 2 weeks ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

ManickaP commented 2 weeks ago

There's an ARM32 issue in MsQuic (https://github.com/microsoft/msquic/issues/3958) that is fixed, but not out yet. The callstack is different though. @nibanks this looks like it might be a problem in MsQuic.

janvorli commented 2 weeks ago

Looking at the call stack above, I can see at frame 2 that the Connection=0x00000000, maybe that's the source of the problem?

liveans commented 1 week ago

@ManickaP Do you think it's worth to close this as duplicate of #103703?

ManickaP commented 1 week ago

Those are different callstacks? We can probably merge it in one issue and copy the details from here there.

janvorli commented 1 week ago

The other issue looks quite different, I don't think it would make sense to merge them together.