dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.24k stars 4.73k forks source link

Assertion failure 0 <= fd && fd < sysconf(_SC_OPEN_MAX) in System.Net.Mail.Functional.Tests #72830

Closed noahfalk closed 2 years ago

noahfalk commented 2 years ago

Description

System.Net.Mail.Functional.Tests are failing with this assert in CI:

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result

----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Mail.Functional.Tests (found 154 of 156 test cases)
  Starting:    System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2)
    System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP]
      Condition(s) not met: "IsNtlmInstalled"
dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed.

Reproduction Steps

Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab

Expected behavior

Test doesn't fail in CI

Actual behavior

Test does fail in CI, see description.

Regression?

Unknown

Known Workarounds

Unknown

Configuration

Linux Debug x64 Mono Interpreter

Other information

No response

{ "ErrorMessage":"0 <= fd && fd < sysconf(_SC_OPEN_MAX)" } 

Report

Build Definition Test Pull Request
47765 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution dotnet/runtime#76871
37711 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
36085 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution
33387 dotnet/runtime System.Net.Mail.Functional.Tests.WorkItemExecution

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 4
ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
### Description System.Net.Mail.Functional.Tests are failing with this assert in CI: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result ``` ----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: ===================================================== pushd . /root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing popd =========================================================================================================== /root/helix/work/workitem/e /root/helix/work/workitem/e Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None) Discovered: System.Net.Mail.Functional.Tests (found 154 of 156 test cases) Starting: System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2) System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP] Condition(s) not met: "IsNtlmInstalled" dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed. ``` ### Reproduction Steps Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab ### Expected behavior Test doesn't fail in CI ### Actual behavior Test does fail in CI, see description. ### Regression? Unknown ### Known Workarounds Unknown ### Configuration Linux Debug x64 Mono Interpreter ### Other information _No response_
Author: noahfalk
Assignees: -
Labels: `area-CodeGen-coreclr`
Milestone: -
EgorBo commented 2 years ago

Judging by the stacktrace and the job itself it's mono-interp

ghost commented 2 years ago

Tagging subscribers to this area: @brzvlad See info in area-owners.md if you want to be subscribed.

Issue Details
### Description System.Net.Mail.Functional.Tests are failing with this assert in CI: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result ``` ----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: ===================================================== pushd . /root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing popd =========================================================================================================== /root/helix/work/workitem/e /root/helix/work/workitem/e Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None) Discovered: System.Net.Mail.Functional.Tests (found 154 of 156 test cases) Starting: System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2) System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP] Condition(s) not met: "IsNtlmInstalled" dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed. ``` ### Reproduction Steps Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab ### Expected behavior Test doesn't fail in CI ### Actual behavior Test does fail in CI, see description. ### Regression? Unknown ### Known Workarounds Unknown ### Configuration Linux Debug x64 Mono Interpreter ### Other information _No response_
Author: noahfalk
Assignees: -
Labels: `untriaged`, `area-Codegen-Interpreter-mono`
Milestone: -
danmoseley commented 2 years ago

I pasted stacks here. It appears that the mail code, or underlying networking code, is attempting to use a file descriptor of -1, which I assume is invalid. https://github.com/dotnet/runtime/issues/72818#issuecomment-1194893236

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
### Description System.Net.Mail.Functional.Tests are failing with this assert in CI: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result ``` ----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: ===================================================== pushd . /root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing popd =========================================================================================================== /root/helix/work/workitem/e /root/helix/work/workitem/e Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None) Discovered: System.Net.Mail.Functional.Tests (found 154 of 156 test cases) Starting: System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2) System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP] Condition(s) not met: "IsNtlmInstalled" dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed. ``` ### Reproduction Steps Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab ### Expected behavior Test doesn't fail in CI ### Actual behavior Test does fail in CI, see description. ### Regression? Unknown ### Known Workarounds Unknown ### Configuration Linux Debug x64 Mono Interpreter ### Other information _No response_
Author: noahfalk
Assignees: -
Labels: `area-System.Net`, `blocking-clean-ci`, `untriaged`
Milestone: -
wfurt commented 2 years ago

cc: @tmds

seems like we do closing magic on invalid socket ... (stack bellow is copy from @danmoseley post)

top of stack looks like

#10 0x00007f4c7b4ab5cd in ToFileDescriptor (fd=-1) at /__w/1/s/src/native/libs/Common/pal_utilities.h:86
#11 0x00007f4c7b4abf95 in SystemNative_FcntlGetFD (fd=-1) at /__w/1/s/src/native/libs/System.Native/pal_io.c:611
...
      at <unknown> <0xffffffff>
      at Fcntl:<GetFD>g____PInvoke|5_0 <0x00020>
      at Fcntl:GetFD <0x00020>
      at System.Net.Sockets.SafeSocketHandle:TryUnblockSocket <0x0003a>
      at System.Net.Sockets.SafeSocketHandle:CloseAsIs <0x000f4>
      at System.Net.Sockets.Socket:Dispose <0x00426>
      at System.Net.Sockets.Socket:Dispose <0x000a4>
      at System.Net.Sockets.Socket:Close <0x00098>
      at System.Net.Sockets.TcpClient:Dispose <0x0011c>
      at System.Net.Sockets.TcpClient:Dispose <0x0001a>
      at System.Net.Mail.SmtpConnection:ShutdownConnection <0x00184>
      at System.Net.Mail.SmtpConnection:Abort <0x00012>
      at System.Net.Mail.SmtpTransport:Abort <0x00078>
      at System.Net.Mail.SmtpClient:Abort <0x0001c>
      at System.Net.Mail.SmtpClient:SendAsyncCancel <0x00088>
      at <>c:<SendMailAsync>b__84_1 <0x0001c>
      at System.Threading.CancellationTokenSource:Invoke <0x00042>
or

#9  0x00007fbc510f8102 in __GI___assert_fail (assertion=0x7fbc4dcdc8a6 "0 <= fd && fd < sysconf(_SC_OPEN_MAX)", file=0x7fbc4dcddbb1 "/__w/1/s/src/native/libs/Common/pal_utilities.h", line=86, function=0x7fbc4dcdd527 "int ToFileDescriptor(intptr_t)") at assert.c:101
#10 0x00007fbc4dceb7cd in ToFileDescriptor (fd=-1) at /__w/1/s/src/native/libs/Common/pal_utilities.h:86
#11 0x00007fbc4dcebd8e in SystemNative_SetLingerOption (socket=-1, option=0x7fbc45d84d18) at /__w/1/s/src/native/libs/System.Native/pal_networking.c:1278
...
      at <unknown> <0xffffffff>
      at Sys:<SetLingerOption>g____PInvoke|34_0 <0x00024>
      at Sys:SetLingerOption <0x00068>
      at System.Net.Sockets.SocketPal:SetLingerOption <0x00092>
      at System.Net.Sockets.Socket:SetLingerOption <0x00022>
      at System.Net.Sockets.Socket:SetSocketOption <0x00248>
      at System.Net.Sockets.Socket:set_LingerState <0x00024>
      at System.Net.Sockets.TcpClient:set_LingerState <0x00022>
      at System.Net.Mail.SmtpConnection:ShutdownConnection <0x000f0>
      at System.Net.Mail.SmtpConnection:Abort <0x00012>
      at System.Net.Mail.SmtpTransport:Abort <0x00078>
      at System.Net.Mail.SmtpClient:Abort <0x0001c>
      at System.Net.Mail.SmtpClient:SendAsyncCancel <0x00088>
      at <>c:<SendMailAsync>b__84_1 <0x0001c>
      at System.Threading.CancellationTokenSource:Invoke <0x00042>
ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
### Description System.Net.Mail.Functional.Tests are failing with this assert in CI: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result ``` ----- start Fri 22 Jul 2022 12:11:04 PM UTC =============== To repro directly: ===================================================== pushd . /root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing popd =========================================================================================================== /root/helix/work/workitem/e /root/helix/work/workitem/e Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None) Discovered: System.Net.Mail.Functional.Tests (found 154 of 156 test cases) Starting: System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2) System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP] Condition(s) not met: "IsNtlmInstalled" dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed. ``` ### Reproduction Steps Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab ### Expected behavior Test doesn't fail in CI ### Actual behavior Test does fail in CI, see description. ### Regression? Unknown ### Known Workarounds Unknown ### Configuration Linux Debug x64 Mono Interpreter ### Other information _No response_
Author: noahfalk
Assignees: -
Labels: `area-System.Net.Sockets`, `blocking-clean-ci`, `untriaged`
Milestone: -
karelz commented 2 years ago

@rzikm can you please check how often it happens? Thanks!

rzikm commented 2 years ago

Very often, 98 hits in the last 14 days. Curiously, none of these are on main

wfurt commented 2 years ago

Aside from some authentication changes, #70046 would be biggest suspect. It may not be root cause as the assert is in Sockets. I tried to reproduce it (main on Linux) but I did not get hit. We can probably look at some of the Linux/Windows core files to see what particular tests are running.

karelz commented 2 years ago

If it is happening that often perhaps we should disable the test for now to avoid noise in CI ... @rzikm thoughts?

rzikm commented 2 years ago

I think it was the same test we disabled in #73452 (SendMailAsync_CanBeCanceled_CancellationToken)

karelz commented 2 years ago

Update: We are re-enabling the test in main via PR #74545 with changes that will hopefully give us more info on the root cause.

wfurt commented 2 years ago

BadExit may be misleading in queries... This is not event PR but main

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-ececf5ef6b4c4ff2a6/System.Net.Mail.Functional.Tests/1/console.8759b633.log?helixlogtype=result

----- start Thu Aug 25 12:40:46 UTC 2022 =============== To repro directly: =====================================================
pushd .
/root/helix/work/correlation/dotnet exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing 
popd
===========================================================================================================
/root/helix/work/workitem/e /root/helix/work/workitem/e
./RunTests.sh: line 168: /root/helix/work/correlation/dotnet: No such file or directory
/root/helix/work/workitem/e
rzikm commented 2 years ago

@wfurt I found out that you need to specifically filter for 134 and 139 exit codes to get true process crashes

wfurt commented 2 years ago

ok. I updated my query. My original one was based on the runfo issues. So far so good.

karelz commented 2 years ago

Not blocking CI as the test was disabled in #73452

wfurt commented 2 years ago

I font understand the comment @karelz. The suspected test was enabled again in https://github.com/dotnet/runtime/pull/74545 I did not see occurrences since. (still may come)

karelz commented 2 years ago

Sure, but right now it is not blocking CI

karelz commented 2 years ago

Status: After re-enabling the test, we got some hits on PRs. @rzikm has actionable dump link.

wfurt commented 2 years ago

this is from https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-74639-merge-d6f8b011644f4edc84/System.Net.Mail.Functional.Tests/1/console.694b6d7a.log?helixlogtype=result

(lldb) clrstack -a
OS Thread Id: 0xfe8 (1)
        Child SP               IP Call Site
00007F1D4AAD3008 00007f5e86cf9387 [InlinedCallFrame: 00007f1d4aad3008] Interop+Sys.<SetLingerOption>g____PInvoke|34_0(IntPtr, LingerOption*)
00007F1D4AAD3008 00007f5e08a331ed [InlinedCallFrame: 00007f1d4aad3008] Interop+Sys.<SetLingerOption>g____PInvoke|34_0(IntPtr, LingerOption*)
00007F1D4AAD3000 00007F5E08A331ED ILStubClass.IL_STUB_PInvoke(IntPtr, LingerOption*)
    PARAMETERS:
        <no data>
        <no data>

00007F1D4AAD30D0 00007F5E08D61E09 Interop+Sys.SetLingerOption(System.Runtime.InteropServices.SafeHandle, LingerOption*) [/_/src/libraries/System.Net.Sockets/src/Microsoft.Interop.LibraryImportGenerator/Microsoft.Interop.LibraryImportGenerator/LibraryImports.g.cs @ 772]
    PARAMETERS:
        socket (0x00007F1D4AAD3118) = 0x00007f1df583ad88
        option (0x00007F1D4AAD3110) = 0x00007f1d4aad3188
    LOCALS:
        0x00007F1D4AAD3108 = 0xffffffffffffffff
        0x00007F1D4AAD3104 = 0xffffffff00007f5e
        0x00007F1D4AAD30F8 = 0x0000000000000001
        0x00007F1D4AAD30F4 = 0x0000000000000000
        0x00007F1D4AAD30F0 = 0x0000000000000000

00007F1D4AAD3130 00007F5E08D61CF0 System.Net.Sockets.SocketPal.SetLingerOption(System.Net.Sockets.SafeSocketHandle, System.Net.Sockets.LingerOption) [/_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs @ 1534]
    PARAMETERS:
        handle (0x00007F1D4AAD3198) = 0x00007f1df583ad88
        optionValue (0x00007F1D4AAD3190) = 0x00007f1df584a790
    LOCALS:
        0x00007F1D4AAD3188 = 0x0000000000000001
        0x00007F1D4AAD3184 = 0x0000000100000000
        0x00007F1D4AAD3178 = 0x0000000000000001
        0x00007F1D4AAD3174 = 0x0000000100007f1d

when I dump the SafeHandle, it has reasonable value...

=============================================================================
(lldb) dumpobj 0x7f1df583ad88
Name:        System.Net.Sockets.SafeSocketHandle
MethodTable: 00007f5e08a797f8
EEClass:     00007f5e08a82d00
Size:        80(0x50) bytes
File:        /mnt/work/B1FC09CD/p/shared/Microsoft.NETCore.App/8.0.0/System.Net.Sockets.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007f5e07bc9a80  4001127        8        System.IntPtr  1 instance 00000000000000A3 handle
00007f5e07b3f0f0  4001128       10         System.Int32  1 instance                8 _state
00007f5e07b3bbf0  4001129       14       System.Boolean  1 instance                1 _ownsHandle
00007f5e07b3bbf0  400112a       15       System.Boolean  1 instance                1 _fullyInitialized
00007f5e08a78c20  4000106       20         System.Int32  1 instance       -559038737 _closeSocketResult
00007f5e08a78c20  4000107       24         System.Int32  1 instance       -559038737 _closeSocketLinger
00007f5e07b3f0f0  4000108       28         System.Int32  1 instance                0 _closeSocketThread
00007f5e07b3f0f0  4000109       2c         System.Int32  1 instance                0 _closeSocketTick
00007f5e07b3f0f0  400010a       30         System.Int32  1 instance                0 _ownClose
00007f5e07b3bbf0  400010b       3c       System.Boolean  1 instance                1 <OwnsHandle>k__BackingField
00007f5e07b3bbf0  400010c       3d       System.Boolean  1 instance                0 _released
00007f5e07b3bbf0  400010d       3e       System.Boolean  1 instance                0 _hasShutdownSend
00007f5e07b3f0f0  400010e       34         System.Int32  1 instance               -1 _receiveTimeout
00007f5e07b3f0f0  400010f       38         System.Int32  1 instance               -1 _sendTimeout
00007f5e07b3bbf0  4000110       3f       System.Boolean  1 instance                0 _nonBlocking
00007f5e08aae4b0  4000111       18 ...ocketAsyncContext  0 instance 00007f1df583aef8 _asyncContext
00007f5e08a796a0  4000112       16         System.Int16  1 instance                2 _trackedOptions
00007f5e07b3bbf0  4000113       40       System.Boolean  1 instance                0 <LastConnectFailed>k__BackingField
00007f5e07b3bbf0  4000114       41       System.Boolean  1 instance                1 <DualMode>k__BackingField
00007f5e07b3bbf0  4000115       42       System.Boolean  1 instance                0 <ExposedHandleOrUntrackedConfiguration>k__BackingField
00007f5e07b3bbf0  4000116       43       System.Boolean  1 instance                0 <PreferInlineCompletions>k__BackingField
00007f5e07b3bbf0  4000117       44       System.Boolean  1 instance                1 <IsSocket>k__BackingField
00007f5e07b3bbf0  4000118       45       System.Boolean  1 instance                0 <IsDisconnected>k__BackingField

any idea how that becomes invalid @jkotas ? I could not confirm the -1 above but the first threads looks suspicious

(lldb) thread select 1
* thread #1, name = 'dotnet', stop reason = signal SIGABRT
    frame #0: 0x00007f5e86cf9387
->  0x7f5e86cf9387: cmpq   $-0x1000, %rax            ; imm = 0xF000
    0x7f5e86cf938d: ja     0x7f5e86cf93ad
    0x7f5e86cf938f: rep    retq
    0x7f5e86cf9391: nopl   (%rax)
(lldb) bt
* thread #1, name = 'dotnet', stop reason = signal SIGABRT
  * frame #0: 0x00007f5e86cf9387

since this was last reported on macOS, it seems unlikely related to #73972.

jkotas commented 2 years ago

@AaronRobinsonMSFT Could you please take a look?

The SafeHandle.handle is a good value (00000000000000A3), but it somehow turned into -1 by the time it got into the P/Invoke.

AaronRobinsonMSFT commented 2 years ago

since this was last reported on macOS

I can't seem to reproduce this on an M1. I will try Linux-x64 next.

wfurt commented 2 years ago

We spent days with @rzikm to reproduce it without any luck @AaronRobinsonMSFT. We really only have some dumps from CI.

wfurt commented 2 years ago

Got another repro on RedHat.7 https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-75287-merge-d8d21444e6d84e278f/System.Net.Mail.Functional.Tests/1/console.30b0113a.log?helixlogtype=result

dump is here.

liveans commented 2 years ago

https://github.com/dotnet/runtime/blob/960e4d723c27a5407dc691d06e546bc455a9c4a5/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.Unix.cs#L139-L140

I think, this is the root cause of this issue. Because it's the only place that we can set the handle to -1 temporarily (via CreateSocket) outside of the Socket constructor. In this case, we're trying to close the socket (or set linger option) while trying to replace the handle.

SafeSocketHandle oldHandle = _handle;
SafeSocketHandle newHandle;
SocketError errorCode = SocketPal.CreateSocket(_addressFamily, _socketType, _protocolType, out newHandle);
_handle = newHandle;

Something like this should fix it.

danmoseley commented 2 years ago

I can't quite see how that fixes it. It's an out parameter, so how is what you wrote different to the existing code?

liveans commented 2 years ago

The problem is race condition, actually. At the beginning of the CreateSocket function we have something like this:

https://github.com/dotnet/runtime/blob/960e4d723c27a5407dc691d06e546bc455a9c4a5/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs#L59

Which means we're replacing the current Socket instance's handle with default constructed SafeSocketHandle, at this point we have a chance to hit -1 as handle's file descriptor if we're running code in multi-thread/async environment.

In the same function we have another line to update file descriptor:

https://github.com/dotnet/runtime/blob/960e4d723c27a5407dc691d06e546bc455a9c4a5/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs#L91

Edit: I deleted the wrong information

danmoseley commented 2 years ago

Ah, I didn't realize it's multithreaded.

wfurt commented 2 years ago

Where is the code in SendAsync @liveans? I only saw some replacement during connect. I think this can still be a problem in cancellation & cleanup kicks in on thread pool. I would be probably worth of trying proposed change.

liveans commented 2 years ago

Where is the code in SendAsync @liveans? I only saw some replacement during connect. I think this can still be a problem in cancellation & cleanup kicks in on thread pool. I would be probably worth of trying proposed change.

Yesterday evening we were discussing it with @antonfirsov as well, after that we noticed that I mistracked the path and ReplaceHandle is not using in SendAsync path (my bad), it's using via SendPacketsAsync path, but I'm still thinking the proposed change can fix this issue, because in the whole code this is the only place that we're changing the handle without using constructor.

tmds commented 2 years ago

I mistracked the path and ReplaceHandle is not using in SendAsync path

I haven't looked at the code, but I think there probably is such a path. SmptClient SendAsync establishes the connection, and connect calls ReplaceHandle on Unix to try multiple IP addresses.

The fix in https://github.com/dotnet/runtime/pull/70046 (previously mentioned by @wfurt) was about making SendAsync keep using an open connection, see https://github.com/dotnet/runtime/issues/49340#issuecomment-1141867717. So that may have triggered the issue.

It's definitely possible we seeing a race between connect replacing the handle, and SmtpClient Abort observing this half-initialized handle.

liveans commented 2 years ago

I mistracked the path and ReplaceHandle is not using in SendAsync path

I haven't looked at the code, but I think there probably is such a path.

We should double check it then, thanks for correcting me!

It's definitely possible we seeing a race between connect replacing the handle, and SmtpClient Abort observing this half-initialized handle.

Proposed fix worth to try then.

karelz commented 1 year ago

So far it has not been reported by external customers. The reports came in only from our CI. It is a race condition when we close socket in parallel during its creation (being created in a batch of socket handles via array overload) -- it is a rare stress scenario with very small time window for it to happen. Impact on customer (on release builds without asserts) -- memory leak of 1 handle.

Not worth servicing 7.0.x, until we get reports from customers.

carlossanlop commented 1 year ago

@karelz @wfurt FYI this failure happened again today in 7.0. Based on the last comment, I won't reopen the issue, but I am pasting all the information here so this gets linked with the affected PR, and to preserve history.

Expand ``` =========================================================================================================== /root/helix/work/workitem/e /root/helix/work/workitem/e Discovering: System.Net.Mail.Functional.Tests (method display = ClassAndMethod, method display options = None) Discovered: System.Net.Mail.Functional.Tests (found 153 of 156 test cases) Starting: System.Net.Mail.Functional.Tests (parallel test collections = on, max threads = 2) System.Net.Mail.Tests.SmtpClientTest.TestGssapiAuthentication [SKIP] Condition(s) not met: "IsNtlmInstalled" ================================================================= Native Crash Reporting ================================================================= Got a SIGABRT while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. ================================================================= dotnet: /__w/1/s/src/native/libs/Common/pal_utilities.h:86: int ToFileDescriptor(intptr_t): Assertion `0 <= fd && fd < sysconf(_SC_OPEN_MAX)' failed. ================================================================= Native stacktrace: ================================================================= 0x7f68d31e0e92 - Unknown 0x7f68d318759e - Unknown 0x7f68d31e0768 - Unknown 0x7f68d40a6630 - Unknown 0x7f68d32db387 - Unknown 0x7f68d32dca78 - Unknown 0x7f68d32d41a6 - Unknown 0x7f68d32d4252 - Unknown 0x7f68d02805ed - Unknown 0x7f68d0280fb5 - Unknown 0x411053fb - Unknown ================================================================= External Debugger Dump: ================================================================= Missing separate debuginfo for /root/helix/work/correlation/dotnet Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/22/ad60fa877b47f26b08c103e8e928402687212b.debug [New LWP 62] [New LWP 58] [New LWP 53] [New LWP 51] [New LWP 42] [New LWP 41] [New LWP 40] [New LWP 39] [New LWP 38] [New LWP 37] [New LWP 32] [New LWP 31] [New LWP 28] [New LWP 27] [New LWP 26] [New LWP 25] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Missing separate debuginfo for /root/helix/work/correlation/host/fxr/7.0.11/libhostfxr.so Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/19/1d9de57b7b94b2c45d49368284fd6d1b814753.debug Missing separate debuginfo for /root/helix/work/correlation/shared/Microsoft.NETCore.App/7.0.11/libhostpolicy.so Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/0e/81d56a9d12fffce10fbfa3704f140d23309670.debug 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 Id Target Id Frame 17 Thread 0x7f68d23ff700 (LWP 25) "SGen worker" 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 16 Thread 0x7f68d0837700 (LWP 26) "dotnet" 0x00007f68d3398ddd in poll () from /lib64/libc.so.6 15 Thread 0x7f68d0636700 (LWP 27) "Finalizer" 0x00007f68d40a4b3b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 14 Thread 0x7f68c98f6700 (LWP 28) "dotnet" 0x00007f68d40a575d in read () from /lib64/libpthread.so.0 13 Thread 0x7f68bfdfc700 (LWP 31) ".NET Long Runni" 0x00007f68d339de29 in syscall () from /lib64/libc.so.6 12 Thread 0x7f68bf5fb700 (LWP 32) ".NET Long Runni" 0x00007f68d339de29 in syscall () from /lib64/libc.so.6 11 Thread 0x7f68c985c700 (LWP 37) ".NET Long Runni" 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 10 Thread 0x7f68c965b700 (LWP 38) ".NET ThreadPool" 0x00007f68d40a2de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 9 Thread 0x7f68c8073700 (LWP 39) ".NET ThreadPool" 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 8 Thread 0x7f68be5f9700 (LWP 40) ".NET ThreadPool" 0x00007f68d40a61d9 in waitpid () from /lib64/libpthread.so.0 7 Thread 0x7f68bedfa700 (LWP 41) ".NET Long Runni" 0x00007f68d40a2de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7f68bc84b700 (LWP 42) ".NET Long Runni" 0x00007f68d40a2de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 0x7f68bc3ff700 (LWP 51) ".NET Sockets" 0x00007f68d33a40e3 in epoll_wait () from /lib64/libc.so.6 4 Thread 0x7f689bfff700 (LWP 53) ".NET Timer" 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x7f689bbeb700 (LWP 58) ".NET ThreadPool" 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x7f689b9ea700 (LWP 62) ".NET ThreadPool" 0x00007f68d40a2de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 * 1 Thread 0x7f68d44c7780 (LWP 24) "dotnet" 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 Thread 17 (Thread 0x7f68d23ff700 (LWP 25)): #0 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d30ce493 in mono_os_cond_wait (cond=, mutex=) at /__w/1/s/src/mono/mono/mini/../../mono/utils/mono-os-mutex.h:219 #2 get_work (worker_index=, work_context=, do_idle=, job=) at /__w/1/s/src/mono/mono/sgen/sgen-thread-pool.c:167 #3 thread_func (data=0x0) at /__w/1/s/src/mono/mono/sgen/sgen-thread-pool.c:198 #4 0x00007f68d409eea5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f68d33a3b0d in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x7f68d0837700 (LWP 26)): #0 0x00007f68d3398ddd in poll () from /lib64/libc.so.6 #1 0x00007f68d327a41a in ipc_poll_fds (fds=, nfds=1, timeout=4294967295) at /__w/1/s/src/native/eventpipe/ds-ipc-pal-socket.c:470 #2 ds_ipc_poll (poll_handles_data=0x7f68cc002250, poll_handles_data_len=1, timeout_ms=4294967295, callback=0x7f68d3279850 ) at /__w/1/s/src/native/eventpipe/ds-ipc-pal-socket.c:1097 #3 0x00007f68d3277925 in ds_ipc_stream_factory_get_next_available_stream (callback=0x7f68d3279850 ) at /__w/1/s/src/native/eventpipe/ds-ipc.c:395 #4 0x00007f68d3276029 in server_thread (data=) at /__w/1/s/src/native/eventpipe/ds-server.c:127 #5 0x00007f68d3279831 in ep_rt_thread_mono_start_func (data=0x555fcfe68e30) at /__w/1/s/src/mono/mono/mini/../eventpipe/ep-rt-mono.h:1356 #6 0x00007f68d409eea5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f68d33a3b0d in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7f68d0636700 (LWP 27)): #0 0x00007f68d40a4b3b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 #1 0x00007f68d40a4bcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 #2 0x00007f68d40a4c6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 #3 0x00007f68d304f366 in mono_os_sem_wait (sem=, flags=MONO_SEM_FLAGS_ALERTABLE) at /__w/1/s/src/mono/mono/mini/../utils/mono-os-semaphore.h:204 #4 mono_coop_sem_wait (sem=, flags=MONO_SEM_FLAGS_ALERTABLE) at /__w/1/s/src/mono/mono/mini/../../mono/utils/mono-coop-semaphore.h:41 #5 finalizer_thread (unused=) at /__w/1/s/src/mono/mono/metadata/gc.c:891 #6 0x00007f68d30282da in start_wrapper_internal (start_info=0x0, stack_ptr=) at /__w/1/s/src/mono/mono/metadata/threads.c:1202 #7 0x00007f68d3028169 in start_wrapper (data=0x555fcfe7a7f0) at /__w/1/s/src/mono/mono/metadata/threads.c:1264 #8 0x00007f68d409eea5 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f68d33a3b0d in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7f68c98f6700 (LWP 28)): #0 0x00007f68d40a575d in read () from /lib64/libpthread.so.0 #1 0x00007f68d028e70e in SignalHandlerLoop (arg=0x555fd06b13b0) at /__w/1/s/src/native/libs/System.Native/pal_signal.c:323 #2 0x00007f68d409eea5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f68d33a3b0d in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7f68bfdfc700 (LWP 31)): #0 0x00007f68d339de29 in syscall () from /lib64/libc.so.6 #1 0x00007f68c87289ce in ust_listener_thread () from /lib64/liblttng-ust.so.0 #2 0x00007f68d409eea5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f68d33a3b0d in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7f68bf5fb700 (LWP 32)): #0 0x00007f68d339de29 in syscall () from /lib64/libc.so.6 #1 0x00007f68c87289ce in ust_listener_thread () from /lib64/liblttng-ust.so.0 #2 0x00007f68d409eea5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f68d33a3b0d in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f68c985c700 (LWP 37)): #0 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d028f2fc in SystemNative_LowLevelMonitor_Wait (monitor=0x7f68c01b2780) at /__w/1/s/src/native/libs/System.Native/pal_threading.c:155 #2 0x0000000040f3791c in ?? () #3 0x00007f68d25c1f60 in ?? () #4 0xffffffffffffffff in ?? () #5 0x00007f68bc990e30 in ?? () #6 0x0000000000000001 in ?? () #7 0x00007f68d25c1f90 in ?? () #8 0xffffffffffffffff in ?? () #9 0x00007f68bc990e30 in ?? () #10 0x00007f68c0169fa0 in ?? () #11 0x00007f68c985b8b0 in ?? () #12 0x00007f68c985b7c0 in ?? () #13 0x00007f68c00008c0 in ?? () #14 0x0000000040f3785c in ?? () #15 0x00007f68c985b930 in ?? () #16 0x0000000040f3781c in ?? () #17 0x00007f68d25c1f90 in ?? () #18 0x0000000040f36ea8 in ?? () #19 0x00007f68c985b930 in ?? () #20 0x0000000000000000 in ?? () Thread 10 (Thread 0x7f68c965b700 (LWP 38)): #0 0x00007f68d40a2de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d028f440 in SystemNative_LowLevelMonitor_TimedWait (monitor=0x7f68b8001310, timeoutMilliseconds=12000) at /__w/1/s/src/native/libs/System.Native/pal_threading.c:195 #2 0x0000000040f90e87 in ?? () #3 0x0000000000000000 in ?? () Thread 9 (Thread 0x7f68c8073700 (LWP 39)): #0 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d028f2fc in SystemNative_LowLevelMonitor_Wait (monitor=0x7f68b00052f0) at /__w/1/s/src/native/libs/System.Native/pal_threading.c:155 #2 0x0000000040f3791c in ?? () #3 0x00007f68d25c8590 in ?? () #4 0xffffffffffffffff in ?? () #5 0x0000000000000000 in ?? () Thread 8 (Thread 0x7f68be5f9700 (LWP 40)): #0 0x00007f68d40a61d9 in waitpid () from /lib64/libpthread.so.0 #1 0x00007f68d31e0fd7 in dump_native_stacktrace (signal=, mctx=) at /__w/1/s/src/mono/mono/mini/mini-posix.c:843 #2 mono_dump_native_crash_info (signal=, mctx=0x7f68be5f7898, info=) at /__w/1/s/src/mono/mono/mini/mini-posix.c:870 #3 0x00007f68d318759e in mono_handle_native_crash (signal=0x7f68d2ef9c0c "SIGABRT", mctx=0x7f68be5f7898, info=0x7f68be5f7b70) at /__w/1/s/src/mono/mono/mini/mini-exceptions.c:3005 #4 0x00007f68d31e0768 in sigabrt_signal_handler (_dummy=, _info=0x7f68be5f7b70, context=0x7f68be5f7a40) at /__w/1/s/src/mono/mono/mini/mini-posix.c:225 #5 #6 0x00007f68d32db387 in raise () from /lib64/libc.so.6 #7 0x00007f68d32dca78 in abort () from /lib64/libc.so.6 #8 0x00007f68d32d41a6 in __assert_fail_base () from /lib64/libc.so.6 #9 0x00007f68d32d4252 in __assert_fail () from /lib64/libc.so.6 #10 0x00007f68d02805ed in ToFileDescriptor (fd=-1) at /__w/1/s/src/native/libs/Common/pal_utilities.h:86 #11 0x00007f68d0280fb5 in SystemNative_FcntlGetFD (fd=-1) at /__w/1/s/src/native/libs/System.Native/pal_io.c:611 #12 0x00000000411053fb in ?? () #13 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f68bedfa700 (LWP 41)): #0 0x00007f68d40a2de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d3079f34 in mono_os_cond_timedwait (cond=0x555fcfdaadc0, mutex=0x555fcfdaad98, timeout_ms=300000) at /__w/1/s/src/mono/mono/utils/mono-os-mutex.c:75 #2 0x00007f68d302d52a in mono_coop_cond_timedwait (cond=, mutex=, timeout_ms=300000) at /__w/1/s/src/mono/mono/mini/../../mono/utils/mono-coop-mutex.h:103 #3 mono_w32handle_timedwait_signal_naked (cond=, mutex=, timeout=300000, poll=0, alerted=) at /__w/1/s/src/mono/mono/metadata/w32handle.c:514 #4 mono_w32handle_timedwait_signal_handle (handle_data=, timeout=300000, poll=0, alerted=0x7f68bedf64d4) at /__w/1/s/src/mono/mono/metadata/w32handle.c:629 #5 0x00007f68d302d29a in mono_w32handle_wait_one (handle=, timeout=300000, alertable=) at /__w/1/s/src/mono/mono/metadata/w32handle.c:738 #6 0x00007f68d3050d54 in mono_monitor_wait (obj_handle=..., ms=, allow_interruption=1 '\001', error=) at /__w/1/s/src/mono/mono/metadata/monitor.c:1364 #7 ves_icall_System_Threading_Monitor_Monitor_wait (obj_handle=..., ms=, allow_interruption=1 '\001', error=) at /__w/1/s/src/mono/mono/metadata/monitor.c:1443 #8 0x00007f68d2fe00f3 in ves_icall_System_Threading_Monitor_Monitor_wait_raw (a0=0x7f68bedf6690, a1=300000, a2=1 '\001') at /__w/1/s/src/mono/mono/mini/../metadata/icall-def.h:570 #9 0x0000000041105bdc in ?? () #10 0x00007f68d25ab748 in ?? () #11 0x00000000000493e0 in ?? () #12 0x0000000000000001 in ?? () #13 0x00000000000493e0 in ?? () #14 0x00007f68d25ab748 in ?? () #15 0x00007f68bedf6cf8 in ?? () #16 0x00007f68bedf67a0 in ?? () #17 0x00007f68bedf6650 in ?? () #18 0x00007f68d25ab748 in ?? () #19 0x00000000411059d0 in ?? () #20 0x00007f68d25ab748 in ?? () #21 0x00000000000493e0 in ?? () #22 0x00007f68bedf67a0 in ?? () #23 0x0000000041105954 in ?? () #24 0x0000000000185b8b in ?? () #25 0x00007f68d29581b0 in ?? () #26 0x00000000000493e0 in ?? () #27 0x0000000041104c24 in ?? () #28 0x0000000000000000 in ?? () Thread 6 (Thread 0x7f68bc84b700 (LWP 42)): #0 0x00007f68d40a2de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d028f440 in SystemNative_LowLevelMonitor_TimedWait (monitor=0x7f689c002420, timeoutMilliseconds=30000) at /__w/1/s/src/native/libs/System.Native/pal_threading.c:195 #2 0x0000000040f90e87 in ?? () #3 0x0000000000000000 in ?? () Thread 5 (Thread 0x7f68bc3ff700 (LWP 51)): #0 0x00007f68d33a40e3 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f68d028a00e in WaitForSocketEventsInner (port=9, buffer=0x7f68a8276d10, count=0x7f68bc3febf8) at /__w/1/s/src/native/libs/System.Native/pal_networking.c:2723 #2 0x00007f68d0289f2f in SystemNative_WaitForSocketEvents (port=9, buffer=0x7f68a8276d10, count=0x7f68bc3febf8) at /__w/1/s/src/native/libs/System.Native/pal_networking.c:3023 #3 0x00000000410b5a79 in ?? () #4 0x0000000000000001 in ?? () #5 0x0000000000000001 in ?? () #6 0x0000000000000000 in ?? () Thread 4 (Thread 0x7f689bfff700 (LWP 53)): #0 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d028f2fc in SystemNative_LowLevelMonitor_Wait (monitor=0x7f6894004fd0) at /__w/1/s/src/native/libs/System.Native/pal_threading.c:155 #2 0x0000000040f3791c in ?? () #3 0x00007f68d24f4da0 in ?? () #4 0xffffffffffffffff in ?? () #5 0x0000000000185b87 in ?? () #6 0x0000000000000001 in ?? () #7 0x00007f68d24f4dd0 in ?? () #8 0xffffffffffffffff in ?? () #9 0x0000000000185b87 in ?? () #10 0x00007f6894000fd0 in ?? () #11 0x00007f689bffe9f0 in ?? () #12 0x00007f689bffe900 in ?? () #13 0x00007f68940008c0 in ?? () #14 0x0000000040f3785c in ?? () #15 0x00007f689bffea70 in ?? () #16 0x0000000040f3781c in ?? () #17 0x00007f68d24f4dd0 in ?? () #18 0x0000000040f36ea8 in ?? () #19 0x00007f689bffea70 in ?? () #20 0x0000000000000000 in ?? () Thread 3 (Thread 0x7f689bbeb700 (LWP 58)): #0 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d3051721 in mono_os_cond_wait (cond=0x7f688c003970, mutex=0x7f688c010e20) at /__w/1/s/src/mono/mono/mini/../../mono/utils/mono-os-mutex.h:219 #2 mono_coop_cond_wait (cond=0x7f688c003970, mutex=0x7f688c010e20) at /__w/1/s/src/mono/mono/mini/../../mono/utils/mono-coop-mutex.h:91 #3 mono_monitor_try_enter_inflated (obj=0x7f68d25a8ea0, ms=4294967295, allow_interruption=0, id=76) at /__w/1/s/src/mono/mono/metadata/monitor.c:875 #4 0x00007f68d3050235 in mono_monitor_try_enter_loop_if_interrupted (obj=0x7f68d25a8ea0, ms=4294967295, allow_interruption=, lockTaken=0x7f689bbea5d0 "", error=0x7f688c010e00) at /__w/1/s/src/mono/mono/metadata/monitor.c:1136 #5 0x0000000040fc8e48 in ?? () #6 0x0000000000000000 in ?? () Thread 2 (Thread 0x7f689b9ea700 (LWP 62)): #0 0x00007f68d40a2de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d3079f34 in mono_os_cond_timedwait (cond=0x7f689b9e9980, mutex=0x7f68c017f7b0, timeout_ms=20000) at /__w/1/s/src/mono/mono/utils/mono-os-mutex.c:75 #2 0x00007f68d307f6e4 in mono_coop_cond_timedwait (cond=0x7f689b9e9980, mutex=, timeout_ms=20000) at /__w/1/s/src/mono/mono/mini/../../mono/utils/mono-coop-mutex.h:103 #3 mono_lifo_semaphore_timed_wait (semaphore=0x7f68c017f7b0, timeout_ms=20000) at /__w/1/s/src/mono/mono/utils/lifo-semaphore.c:48 #4 0x0000000040f9c837 in ?? () #5 0x0000000000000002 in ?? () #6 0x0000000000000046 in ?? () #7 0x00007f68d25c4810 in ?? () #8 0x00007f68d25c4810 in ?? () #9 0x0000000000004e20 in ?? () #10 0x00007f6890000fd0 in ?? () #11 0x0000000000000046 in ?? () #12 0x00007f689b9e99f0 in ?? () #13 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f68d44c7780 (LWP 24)): #0 0x00007f68d40a2a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f68d028f2fc in SystemNative_LowLevelMonitor_Wait (monitor=0x555fd08efe80) at /__w/1/s/src/native/libs/System.Native/pal_threading.c:155 #2 0x0000000040f3791c in ?? () #3 0x00007f68bc9c3f10 in ?? () #4 0xffffffffffffffff in ?? () #5 0x00007f68d25b7530 in ?? () #6 0x0000000000000001 in ?? () #7 0x00007f68bc9c3f40 in ?? () #8 0xffffffffffffffff in ?? () #9 0x00007f68d25b7530 in ?? () #10 0x0000555fcfe60430 in ?? () #11 0x00007ffca22e9b60 in ?? () #12 0x00007ffca22e9a70 in ?? () #13 0x0000555fcfdeeb30 in ?? () #14 0x0000000040f3785c in ?? () #15 0x00007ffca22e9be0 in ?? () #16 0x0000000040f3781c in ?? () #17 0x00007f68bc9c3f40 in ?? () #18 0x0000000040f36ea8 in ?? () #19 0x00007ffca22e9be0 in ?? () #20 0x0000000000000000 in ?? () [Inferior 1 (process 24) detached] ================================================================= Basic Fault Address Reporting ================================================================= Memory around native instruction pointer (0x7f68d32db387):0x7f68d32db377 48 63 d7 48 63 f6 48 63 f9 b8 ea 00 00 00 0f 05 Hc.Hc.Hc........ 0x7f68d32db387 48 3d 00 f0 ff ff 77 1e f3 c3 0f 1f 80 00 00 00 H=....w......... 0x7f68d32db397 00 85 c9 7f db 89 c8 f7 d8 81 e1 ff ff ff 7f 0f ................ 0x7f68d32db3a7 44 c6 89 c1 eb ca 48 8b 15 9c 0a 39 00 f7 d8 64 D.....H....9...d ================================================================= Managed Stacktrace: ================================================================= at <0xffffffff> at Fcntl:g____PInvoke|5_0 <0x000aa> at Fcntl:GetFD <0x00033> at System.Net.Sockets.SafeSocketHandle:TryUnblockSocket <0x00093> at System.Net.Sockets.SafeSocketHandle:CloseAsIs <0x00203> at System.Net.Sockets.Socket:Dispose <0x0070b> at System.Net.Sockets.Socket:Dispose <0x000e1> at System.Net.Sockets.Socket:Close <0x000db> at System.Net.Sockets.TcpClient:Dispose <0x0014b> at System.Net.Sockets.TcpClient:Dispose <0x00031> at System.Net.Mail.SmtpConnection:ShutdownConnection <0x0023b> at System.Net.Mail.SmtpConnection:Abort <0x0002f> at System.Net.Mail.SmtpTransport:Abort <0x000ab> at System.Net.Mail.SmtpClient:Abort <0x00033> at System.Net.Mail.SmtpClient:TimeOutCallback <0x0004f> at <>c:<.cctor>b__27_0 <0x0005e> at System.Threading.ExecutionContext:RunInternal <0x000e2> at System.Threading.TimerQueueTimer:CallCallback <0x000fb> at System.Threading.TimerQueueTimer:Fire <0x0012f> at System.Threading.TimerQueue:FireNextTimers <0x00367> at System.Threading.TimerQueue:System.Threading.IThreadPoolWorkItem.Execute <0x0002b> at System.Threading.ThreadPoolWorkQueue:Dispatch <0x0043b> at WorkerThread:WorkerThreadStart <0x001cb> at System.Threading.Thread:StartCallback <0x000f0> at System.Object:runtime_invoke_void__this__ <0x00091> ================================================================= ./RunTests.sh: line 168: 24 Aborted (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Net.Mail.Functional.Tests.runtimeconfig.json --depsfile System.Net.Mail.Functional.Tests.deps.json xunit.console.dll System.Net.Mail.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE /root/helix/work/workitem/e ----- end Tue Aug 8 18:42:29 UTC 2023 ----- exit code 134 ---------------------------------------------------------- exit code 134 means SIGABRT Abort. Managed or native assert, or runtime check such as heap corruption, caused call to abort(). Core dumped. ulimit -c value: unlimited [ 1452.802029] docker0: port 1(veth319b311) entered blocking state [ 1452.802030] docker0: port 1(veth319b311) entered forwarding state [ 1549.372157] docker0: port 1(veth319b311) entered disabled state [ 1549.372185] veth3587465: renamed from eth0 [ 1549.490395] docker0: port 1(veth319b311) entered disabled state [ 1549.491393] device veth319b311 left promiscuous mode [ 1549.491407] docker0: port 1(veth319b311) entered disabled state [ 1565.561385] docker0: port 1(vetha7a7ea0) entered blocking state [ 1565.561387] docker0: port 1(vetha7a7ea0) entered disabled state [ 1565.561498] device vetha7a7ea0 entered promiscuous mode [ 1565.561595] IPv6: ADDRCONF(NETDEV_UP): vetha7a7ea0: link is not ready [ 1565.872486] eth0: renamed from veth9a26e10 [ 1565.919146] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 1565.921079] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 1565.921090] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1565.921108] IPv6: ADDRCONF(NETDEV_CHANGE): vetha7a7ea0: link becomes ready [ 1565.921124] docker0: port 1(vetha7a7ea0) entered blocking state [ 1565.921125] docker0: port 1(vetha7a7ea0) entered forwarding state [ 1570.069243] docker0: port 1(vetha7a7ea0) entered disabled state [ 1570.069297] veth9a26e10: renamed from eth0 [ 1570.160406] docker0: port 1(vetha7a7ea0) entered disabled state [ 1570.162232] device vetha7a7ea0 left promiscuous mode [ 1570.162243] docker0: port 1(vetha7a7ea0) entered disabled state [ 1575.982520] docker0: port 1(veth5305d79) entered blocking state [ 1575.982523] docker0: port 1(veth5305d79) entered disabled state [ 1575.982569] device veth5305d79 entered promiscuous mode [ 1575.986841] IPv6: ADDRCONF(NETDEV_UP): veth5305d79: link is not ready [ 1576.280354] eth0: renamed from veth0d8b7bd [ 1576.328095] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 1576.329880] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 1576.329891] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1576.329907] IPv6: ADDRCONF(NETDEV_CHANGE): veth5305d79: link becomes ready [ 1576.329922] docker0: port 1(veth5305d79) entered blocking state [ 1576.329923] docker0: port 1(veth5305d79) entered forwarding state [ 1585.750024] docker0: port 1(veth5305d79) entered disabled state [ 1585.750049] veth0d8b7bd: renamed from eth0 [ 1585.835519] docker0: port 1(veth5305d79) entered disabled state [ 1585.837601] device veth5305d79 left promiscuous mode [ 1585.837608] docker0: port 1(veth5305d79) entered disabled state [ 1593.863487] docker0: port 1(veth62bb91d) entered blocking state [ 1593.863490] docker0: port 1(veth62bb91d) entered disabled state [ 1593.863529] device veth62bb91d entered promiscuous mode [ 1593.863679] IPv6: ADDRCONF(NETDEV_UP): veth62bb91d: link is not ready [ 1594.140330] eth0: renamed from veth83150e2 [ 1594.187973] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 1594.189905] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 1594.189915] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 1594.189933] IPv6: ADDRCONF(NETDEV_CHANGE): veth62bb91d: link becomes ready [ 1594.189949] docker0: port 1(veth62bb91d) entered blocking state [ 1594.189950] docker0: port 1(veth62bb91d) entered forwarding state Waiting a few seconds for any dump to be written.. cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p cat /proc/sys/kernel/core_uses_pid: 0 cat /proc/sys/kernel/coredump_filter: Looking around for any Linux dump.. cat: /proc/sys/kernel/coredump_filter: No such file or directory ... found no dump in /root/helix/work/workitem/e + export _commandExitCode=134 + _commandExitCode=134 ```