Samsung / netcoredbg

NetCoreDbg is a managed code debugger with MI interface for CoreCLR.
MIT License
825 stars 104 forks source link

Debugger stopping on exceptions #89

Closed Martin-Molinero closed 2 years ago

Martin-Molinero commented 2 years ago

Using latest release 2.0.0-895 linux amd64 net5, --interpreter=vscode --engineLogging --server=5678

Randomly, but significantly often, the debugger seems to be stopping on handled exceptions. Which shouldn't be triggering an exception in the first place 🤔, they don't while not debugging. Reminds me of https://github.com/Samsung/netcoredbg/issues/72 .

Example 1: image

Logs don't show anything significant image

Example 2: Ref https://github.com/QuantConnect/Lean/blob/master/Engine/DataFeeds/WorkScheduling/WeightedWorkQueue.cs#L98 image

viewizard commented 2 years ago

Hmm... for second example looks like exception throws directly from https://github.com/QuantConnect/Lean/blob/e0d29e1da75f1fa74110d8fa4dca62a849914387/Engine/DataFeeds/Enumerators/EnqueueableEnumerator.cs#L89-L97 but I don't see anything here.

Could you please also share VSCode debugger log? https://github.com/OmniSharp/omnisharp-vscode/wiki/Enabling-C%23-debugger-logging

One more point, is it possible add PDBs for QuantConnect? From one side, we could get more info, from another side this could confirm that this is not "not user code" related issue (as I see in stack traces you provided this happens in "not user code" threads only?).

Martin-Molinero commented 2 years ago

Hey! thanks for the quick come back, I'll get the vscode side logs 👍 Just a gut but I don't think the location of the exception matters, here is another example I just caught image

Martin-Molinero commented 2 years ago

Okay here we have some logs logs.omnisharp.debugging.txt Case image

viewizard commented 2 years ago

Damn, it didn't call exceptionInfo VSCode command, looks like VSCode do this for user code only. Could you please evaluate variable $exception (you could add it into WATCH windows in VSCode)? You could investigate exception object it return at $exception evaluation, plus, in debugger log we will see all this info too.

I also found in log: You are debugging a Release build of protobuf-net.Core.dll. Using Just My Code with Release builds using compiler optimizations results in a degraded debugging experience (e.g. breakpoints will not be hit)..

Probably, you might have better debugging experience with disabled "just my code" https://github.com/OmniSharp/omnisharp-vscode/blob/master/debugger-launchjson.md#just-my-code Note, in case JMC is disabled, for DLL with PDB jit optimization will be off by debugger at DLL load: https://github.com/Samsung/netcoredbg/blob/a8bd3b95328f19dfe5519973b8176f40d3b4f509/src/metadata/modules.cpp#L769-L770

DavidThielen commented 2 years ago

I'm the user who initially reported this. It occurred consistently for me where the inner exception was in both the LEAN code and in the Newtonsoft code. And for an algo that runs fine if not debugging. I think @Martin-Molinero fully listed what I hit, but if there's anything I can do to help, please let me know.

Martin-Molinero commented 2 years ago

Hey @viewizard! Did some more testing, even disabling just my code can't seem to get more details about what's going on here, debugger logs don't show anything either. Any ideas on how to continue debugging this issue? image

viewizard commented 2 years ago

Hmm... error CS1056: Unexpected character '$'... looks like some issue for me, we have ReplaceInternalNames() that must care about internal variables, but looks like it was not called. Looks like $exception evaluation was broken during evaluation code refactor. Just noticed, that $exception evaluation not covered by tests. I will check this at work (4 May).

Any way we need more info about this exception, and direct $exception evaluation is the only way I know.

viewizard commented 2 years ago

Just noticed, that error CS1056 was related to $e evaluation, but any way I will check $exception evaluation.

Martin-Molinero commented 2 years ago

Update: we've tested going back until 1.2.0-825 and the issue reproduces, although it seems to happen less often & debugger is also slower, maybe race condition? Seen some occasional segmentation fault too

viewizard commented 2 years ago

I found the point of issue (at least, I believe I found it), I was not able to reproduce this issue, but in logs.omnisharp.debugging.txt that @Martin-Molinero provided I found the line:

<- (E) {"body":{"allThreadsStopped":true,"reason":"exception","threadId":71},"event":"stopped","seq":"713","type":"event"}

this line don't have text after reason... so, I analyzed code and found only one case, when debugger could not provide text in case of exception stop event.

BTW I don't know why you have errors in case CoreCLR debug API calls (I see you have 0x8013130a - CORDBG_E_FUNCTION_NOT_IL and 0x80131302 - CORDBG_E_PROCESS_NOT_SYNCHRONIZED in log), but we have nothing to do with this (in case CoreCLR debug API failed), just ignore this exception and continue debuggee process execution.

@Martin-Molinero @DavidThielen Thanks a lot!

Here is possible patch: 0001-Fix-callbacks-return-code-check.txt Will be in upstream at next public repo sync.

Martin-Molinero commented 2 years ago

Hey @viewizard! Thanks for the update, I've built and tested the proposed changes ~but the issue persists, examples:~

stopped, reason: exception received, name: , exception: , stage: , category: , thread id: 6463, stopped-threads: all, frame={QuantConnect.Util.ObjectActivator.GetActivator() at /LeanCloud/CI.Builder/bin/Debug/src/QuantConnect/Lean/Common/Util/ObjectActivator.cs:56
stopped, reason: exception received, name: , exception: , stage: , category: , thread id: 6556, stopped-threads: all, frame={QuantConnect.Util.LeanData.GenerateZipFileName() at /LeanCloud/CI.Builder/bin/Debug/src/QuantConnect/Lean/Common/Util/LeanData.cs:723

The issues don't happen on every run but quite often, which seems like a race condition? I was able to capture a dump and get the stack trace out, got the same error a few times:

Reading symbols from /QuantConnect/netcoredbg...
(No debugging symbols found in /QuantConnect/netcoredbg)
[New LWP 5954]
[New LWP 5955]
[New LWP 5946]
[New LWP 5963]
[New LWP 5959]
[New LWP 5961]
[New LWP 5962]
[New LWP 5958]
[New LWP 5956]
[New LWP 5947]
[New LWP 5957]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `../netcoredbg --interpreter=cli -- dotnet QuantConnect.Lean.Launcher.dll'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004af719 in netcoredbg::GetExceptionModuleName(ICorDebugFrame*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()

If it were helpful, I can run some test version with a bunch of logs if you can provide the patch/diff @viewizard , might help pin down the underlying issue further?

Martin-Molinero commented 2 years ago

whops! sorry I hadn't applied the patch yet 😅, after doing so:

viewizard commented 2 years ago

whops! sorry I hadn't applied the patch yet 😅, after doing so:

Ahh... I just checked all around one more time and almost finished wrote you about "please check that patch applied". :-D

I did see the debugging session hang 1/20

This could be netcoredbg or coreclr part (our debugger use managed part too). Unfotunately, the only way analyze hang - build debug debugger version and attach with gdb during debugger hang and print all backraces (for all threads)...

Seeing the same seg fault as posted above, plus this one sometimes

SIGSEGV inside libcoreclr.so, interesting. Could you please build debug netcoredbg version and share bt, so, probably we will see some netcoredbg related part?

Martin-Molinero commented 2 years ago

Could you please build debug netcoredbg version and share bt, so, probably we will see some netcoredbg related part?

Sure! Mind pointing how to build it in debug mode?

viewizard commented 2 years ago
mkdir ./build
cd ./build
CC=clang CXX=clang++ cmake .. -DCMAKE_INSTALL_PREFIX=$PWD/../bin -DCORECLR_DIR=/coreclr -DDOTNET_DIR=/dotnet-sdk -DCMAKE_BUILD_TYPE=Debug
cmake --build . --target install -- -j10
Martin-Molinero commented 2 years ago

Sweet, here they are:

Mid debugging

Reading symbols from /QuantConnect/netcoredbg...
[New LWP 9449]
[New LWP 9450]
[New LWP 9447]
[New LWP 9434]
[New LWP 9442]
[New LWP 9446]
[New LWP 9451]
[New LWP 9443]
[New LWP 9444]
[New LWP 9435]
[New LWP 9445]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `../netcoredbg --interpreter=cli -- dotnet QuantConnect.Lean.Launcher.dll'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f431a8014dc in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
[Current thread is 1 (Thread 0x7f4273ffd700 (LWP 9449))]
(gdb) bt
#0  0x00007f431a8014dc in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#1  0x00007f431a81852d in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#2  <signal handler called>
#3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#4  0x00007f431c7db859 in __GI_abort () at abort.c:79
#5  0x00007f431c7db729 in __assert_fail_base (fmt=0x7f431c971588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0xac9a93 "m_threadsExceptionStatus.find(tid) == m_threadsExceptionStatus.end()",
    file=0xac9862 "/home/netcoredbg/src/debugger/breakpoints_exception.cpp", line=392, function=<optimized out>) at assert.c:92
#6  0x00007f431c7ed006 in __GI___assert_fail (assertion=0xac9a93 "m_threadsExceptionStatus.find(tid) == m_threadsExceptionStatus.end()", file=0xac9862 "/home/netcoredbg/src/debugger/breakpoints_exception.cpp", line=392,
    function=0xac9ad8 "HRESULT netcoredbg::ExceptionBreakpoints::ManagedCallbackException(ICorDebugThread *, netcoredbg::ExceptionCallbackType, std::string, netcoredbg::StoppedEvent &)") at assert.c:101
#7  0x00000000007cf9c7 in netcoredbg::ExceptionBreakpoints::ManagedCallbackException (this=0x29b8290, pThread=0x7f4268047d78, eventType=netcoredbg::ExceptionCallbackType::FIRST_CHANCE, excModule="System.Private.CoreLib.dll", event=...)
    at /home/netcoredbg/src/debugger/breakpoints_exception.cpp:392
#8  0x0000000000820b09 in netcoredbg::Breakpoints::ManagedCallbackException (this=0x29b8190, pThread=0x7f4268047d78, eventType=netcoredbg::ExceptionCallbackType::FIRST_CHANCE, excModule="System.Private.CoreLib.dll", event=...)
    at /home/netcoredbg/src/debugger/breakpoints.cpp:163
#9  0x00000000008d12e6 in netcoredbg::ManagedCallback::CallbacksWorkerException (this=0x7f431406efa0, pAppDomain=0x7f430c000d88, pThread=0x7f4268047d78, eventType=netcoredbg::ExceptionCallbackType::FIRST_CHANCE,
    excModule="System.Private.CoreLib.dll") at /home/netcoredbg/src/debugger/managedcallback.cpp:112
#10 0x00000000008d1815 in netcoredbg::ManagedCallback::CallbacksWorker (this=0x7f431406efa0) at /home/netcoredbg/src/debugger/managedcallback.cpp:156
#11 0x00000000008f7f57 in std::__invoke_impl<void, void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> (
    __f=@0x7f431400bd50: (void (netcoredbg::ManagedCallback::*)(netcoredbg::ManagedCallback * const)) 0x8d1570 <netcoredbg::ManagedCallback::CallbacksWorker()>, __t=@0x7f431400bd48: 0x7f431406efa0)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:73
#12 0x00000000008f7e62 in std::__invoke<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> (
    __fn=@0x7f431400bd50: (void (netcoredbg::ManagedCallback::*)(netcoredbg::ManagedCallback * const)) 0x8d1570 <netcoredbg::ManagedCallback::CallbacksWorker()>, __args=@0x7f431400bd48: 0x7f431406efa0)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95
#13 0x00000000008f7e25 in std::thread::_Invoker<std::tuple<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> >::_M_invoke<0ul, 1ul> (this=0x7f431400bd48)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244
#14 0x00000000008f7dd5 in std::thread::_Invoker<std::tuple<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> >::operator() (this=0x7f431400bd48) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251
#15 0x00000000008f7b9e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> > >::_M_run (this=0x7f431400bd40)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195
#16 0x00007f431cbebde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007f431ccff609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#18 0x00007f431c8d8163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Shutting down

Reading symbols from /QuantConnect/netcoredbg...
[New LWP 9514]
[New LWP 9521]
[New LWP 9513]
[New LWP 9526]
[New LWP 9522]
[New LWP 9528]
[New LWP 9529]
[New LWP 9524]
[New LWP 9530]
[New LWP 9523]
[New LWP 9525]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `../netcoredbg --interpreter=cli -- dotnet QuantConnect.Lean.Launcher.dll'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1d5dc314dc in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
[Current thread is 1 (Thread 0x7f1d5fafc700 (LWP 9514))]
(gdb) bt
#0  0x00007f1d5dc314dc in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#1  0x00007f1d5dc4852d in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#2  <signal handler called>
#3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#4  0x00007f1d5fc0b859 in __GI_abort () at abort.c:79
#5  0x00007f1d5ffe34b9 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f1d5ffef3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f1d5ffef6a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x0000000000ab51fb in (anonymous namespace)::AsyncWrite::operator() (this=0x7f1d5fafbd20) at /home/netcoredbg/src/utils/iosystem_unix.cpp:103
#9  0x0000000000ab4fb9 in netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag>::AsyncHandle::TraitsImpl<(anonymous namespace)::AsyncWrite>::traits::{lambda(void*)#1}::operator()(netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag>::AsyncHandle::TraitsImpl<(anonymous namespace)::AsyncWrite>::traits) const (this=0x7f1d5fafbd20, thiz=0x7f1d5fafbd20) at /home/netcoredbg/src/utils/iosystem_unix.cpp:122
#10 0x0000000000ab4f52 in netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag>::AsyncHandle::TraitsImpl<(anonymous namespace)::AsyncWrite>::traits::{lambda(void*)#1}::__invoke(netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag>::AsyncHandle::TraitsImpl<(anonymous namespace)::AsyncWrite>::traits) (thiz=0x7f1d5fafbd20) at /home/netcoredbg/src/utils/iosystem_unix.cpp:121
#11 0x0000000000ab56d9 in netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag>::AsyncHandle::operator() (this=0x7f1d5fafbd10) at /home/netcoredbg/src/utils/iosystem_unix.h:51
#12 0x0000000000ab430a in netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag>::async_result (handle=...) at /home/netcoredbg/src/utils/iosystem_unix.cpp:330
#13 0x0000000000ab1785 in netcoredbg::IOSystemImpl<netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag> >::async_result (h=...) at /home/netcoredbg/src/utils/iosystem.h:170
#14 0x0000000000aae108 in netcoredbg::IORedirectHelper::ProcessFinishedWriteRequests (this=0x2558400, read_lock=..., out_stream=0x2558758, out_handle=...) at /home/netcoredbg/src/utils/ioredirect.cpp:271
#15 0x0000000000aad59f in netcoredbg::IORedirectHelper::worker (this=0x2558400) at /home/netcoredbg/src/utils/ioredirect.cpp:225
#16 0x0000000000ab33e7 in std::__invoke_impl<void, void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> (
    __f=@0x2553370: (void (netcoredbg::IORedirectHelper::*)(netcoredbg::IORedirectHelper * const)) 0xaac520 <netcoredbg::IORedirectHelper::worker()>, __t=@0x2553368: 0x2558400)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:73
#17 0x0000000000ab32f2 in std::__invoke<void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> (
    __fn=@0x2553370: (void (netcoredbg::IORedirectHelper::*)(netcoredbg::IORedirectHelper * const)) 0xaac520 <netcoredbg::IORedirectHelper::worker()>, __args=@0x2553368: 0x2558400)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95
#18 0x0000000000ab32b5 in std::thread::_Invoker<std::tuple<void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> >::_M_invoke<0ul, 1ul> (this=0x2553368)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244
#19 0x0000000000ab3265 in std::thread::_Invoker<std::tuple<void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> >::operator() (this=0x2553368) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251
#20 0x0000000000ab302e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> > >::_M_run (this=0x2553360)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195
#21 0x00007f1d6001bde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#22 0x00007f1d6012f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#23 0x00007f1d5fd08163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Martin-Molinero commented 2 years ago

Seems like a different call stack mid debugging:

Reading symbols from /QuantConnect/netcoredbg...
[New LWP 10246]
[New LWP 10253]
[New LWP 10251]
[New LWP 10238]
[New LWP 10255]
[New LWP 10248]
[New LWP 10254]
[New LWP 10247]
[New LWP 10252]
[New LWP 10250]
[New LWP 10239]
[New LWP 10249]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `../netcoredbg --interpreter=cli -- dotnet QuantConnect.Lean.Launcher.dll'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000008d6a3f in netcoredbg::GetExceptionModuleName (pFrame=0x0, excModule=...) at /home/netcoredbg/src/debugger/managedcallback.cpp:727
727     /home/netcoredbg/src/debugger/managedcallback.cpp: No such file or directory.
[Current thread is 1 (Thread 0x7f7ceb237700 (LWP 10246))]
(gdb) bt
#0  0x00000000008d6a3f in netcoredbg::GetExceptionModuleName (pFrame=0x0, excModule="") at /home/netcoredbg/src/debugger/managedcallback.cpp:727
#1  0x00000000008d685e in netcoredbg::ManagedCallback::Exception(ICorDebugAppDomain*, ICorDebugThread*, ICorDebugFrame*, unsigned int, CorDebugExceptionCallbackType, unsigned int)::$_3::operator()() const (this=0x7f7cdd158720)
    at /home/netcoredbg/src/debugger/managedcallback.cpp:778
#2  0x00000000008d667d in std::_Function_handler<void (), netcoredbg::ManagedCallback::Exception(ICorDebugAppDomain*, ICorDebugThread*, ICorDebugFrame*, unsigned int, CorDebugExceptionCallbackType, unsigned int)::$_3>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:300
#3  0x00000000008d7f2e in std::function<void ()>::operator()() const (this=0x7f7ceb2367f0) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:688
#4  0x00000000008d1a5f in netcoredbg::ManagedCallback::AddCallbackToQueue(ICorDebugAppDomain*, std::function<void ()>) (this=0x7f7ce406efd0, pAppDomain=0x7f7cdc000d88, callback=...) at /home/netcoredbg/src/debugger/managedcallback.cpp:192
#5  0x00000000008d541a in netcoredbg::ManagedCallback::Exception (this=0x7f7ce406efd0, pAppDomain=0x7f7cdc000d88, pThread=0x7f7c38047a68, pFrame=0x0, nOffset=12, dwEventType=DEBUG_EXCEPTION_FIRST_CHANCE, dwFlags=1)
    at /home/netcoredbg/src/debugger/managedcallback.cpp:769
#6  0x00000000008d55af in non-virtual thunk to netcoredbg::ManagedCallback::Exception(ICorDebugAppDomain*, ICorDebugThread*, ICorDebugFrame*, unsigned int, CorDebugExceptionCallbackType, unsigned int) ()
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:1310
#7  0x00007f7ceb544872 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#8  0x00007f7ceb584eb5 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#9  0x00007f7ceb58e938 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#10 0x00007f7ceb58f23c in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#11 0x00007f7ceb58f319 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#12 0x00007f7ceb3fe28e in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#13 0x00007f7ceccc3609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007f7cec89c163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Martin-Molinero commented 2 years ago

Slightly different call stack, mid debugging

Reading symbols from /QuantConnect/netcoredbg...
[New LWP 10756]
[New LWP 10749]
[New LWP 10757]
[New LWP 10741]
[New LWP 10758]
[New LWP 10753]
[New LWP 10754]
[New LWP 10751]
[New LWP 10742]
[New LWP 10750]
[New LWP 10752]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `../netcoredbg --interpreter=cli -- dotnet QuantConnect.Lean.Launcher.dll'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f53472e74dc in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
[Current thread is 1 (Thread 0x7f52cd231700 (LWP 10756))]
(gdb) bt
#0  0x00007f53472e74dc in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#1  0x00007f53472fe928 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#2  <signal handler called>
#3  0x0000000000990c2e in netcoredbg::TypePrinter::GetTypeOfValue (pType=0x0, elementType="", arrayType="") at /home/netcoredbg/src/metadata/typeprinter.cpp:402
#4  0x0000000000990313 in netcoredbg::TypePrinter::GetTypeOfValue (pType=0x0, output="") at /home/netcoredbg/src/metadata/typeprinter.cpp:791
#5  0x0000000000990b1e in netcoredbg::TypePrinter::GetTypeOfValue (pValue=0x7f53480fa7d8, output="") at /home/netcoredbg/src/metadata/typeprinter.cpp:388
#6  0x00000000007cf8ae in netcoredbg::ExceptionBreakpoints::ManagedCallbackException (this=0x228e290, pThread=0x7f5288048938, eventType=netcoredbg::ExceptionCallbackType::CATCH_HANDLER_FOUND, excModule="", event=...)
    at /home/netcoredbg/src/debugger/breakpoints_exception.cpp:381
#7  0x0000000000820b09 in netcoredbg::Breakpoints::ManagedCallbackException (this=0x228e190, pThread=0x7f5288048938, eventType=netcoredbg::ExceptionCallbackType::CATCH_HANDLER_FOUND, excModule="", event=...)
    at /home/netcoredbg/src/debugger/breakpoints.cpp:163
#8  0x00000000008d12e6 in netcoredbg::ManagedCallback::CallbacksWorkerException (this=0x7f534806efd0, pAppDomain=0x7f5340000d88, pThread=0x7f5288048938, eventType=netcoredbg::ExceptionCallbackType::CATCH_HANDLER_FOUND,
    excModule="") at /home/netcoredbg/src/debugger/managedcallback.cpp:112
#9  0x00000000008d1815 in netcoredbg::ManagedCallback::CallbacksWorker (this=0x7f534806efd0) at /home/netcoredbg/src/debugger/managedcallback.cpp:156
#10 0x00000000008f7f57 in std::__invoke_impl<void, void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> (
    __f=@0x7f534800bd50: (void (netcoredbg::ManagedCallback::*)(netcoredbg::ManagedCallback * const)) 0x8d1570 <netcoredbg::ManagedCallback::CallbacksWorker()>, __t=@0x7f534800bd48: 0x7f534806efd0)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:73
#11 0x00000000008f7e62 in std::__invoke<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> (
    __fn=@0x7f534800bd50: (void (netcoredbg::ManagedCallback::*)(netcoredbg::ManagedCallback * const)) 0x8d1570 <netcoredbg::ManagedCallback::CallbacksWorker()>, __args=@0x7f534800bd48: 0x7f534806efd0)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95
#12 0x00000000008f7e25 in std::thread::_Invoker<std::tuple<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> >::_M_invoke<0ul, 1ul> (this=0x7f534800bd48)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244
#13 0x00000000008f7dd5 in std::thread::_Invoker<std::tuple<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> >::operator() (this=0x7f534800bd48)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251
#14 0x00000000008f7b9e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> > >::_M_run (this=0x7f534800bd40)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195
#15 0x00007f534d7a4de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#16 0x00007f534d8b8609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007f534d491163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
viewizard commented 2 years ago

Thanks a lot!

I already see some points, that we didn't take into account, for example

#5  0x00000000008d541a in netcoredbg::ManagedCallback::Exception (this=0x7f7ce406efd0, pAppDomain=0x7f7cdc000d88, pThread=0x7f7c38047a68, pFrame=0x0, nOffset=12, dwEventType=DEBUG_EXCEPTION_FIRST_CHANCE, dwFlags=1)
    at /home/netcoredbg/src/debugger/managedcallback.cpp:769

pFrame=0x0 - we never count on nulled frame in this callback from CLR... MS Docs say nothing about this.

Will analyze this backtraces, extremely interesting.

Martin-Molinero commented 2 years ago

Glad to help, and thank you! 🙌 Was able to capture the case where it hangs

Just a note, maybe helps: no break point was set in any of these debug runs

(gdb) thread apply all bt

Thread 11 (Thread 0x7f752f7fc700 (LWP 11781)):
#0  __libc_read (nbytes=48, buf=0x7f752f7fbda0, fd=32) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __libc_read (fd=32, buf=0x7f752f7fbda0, nbytes=48) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f75d74f3042 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#3  0x00007f75d7525bdf in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#4  0x00007f75d7523e99 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#5  0x00007f75d73ea28e in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#6  0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 10 (Thread 0x7f75d4122700 (LWP 11780)):
#0  futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f75d4121a50, clockid=<optimized out>, expected=0, futex_word=0x7f75d00dee50) at ../sysdeps/nptl/futex-internal.h:320
#1  __pthread_cond_wait_common (abstime=0x7f75d4121a50, clockid=<optimized out>, mutex=0x7f75d00dee00, cond=0x7f75d00dee28) at pthread_cond_wait.c:520
#2  __pthread_cond_timedwait (cond=0x7f75d00dee28, mutex=0x7f75d00dee00, abstime=0x7f75d4121a50) at pthread_cond_wait.c:656
#3  0x00007f75d73ddf56 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#4  0x00007f75d73ddc21 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#5  0x00007f75d73e2602 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#6  0x00007f75d73e2911 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#7  0x00007f75d7563649 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#8  0x00007f75d757bae8 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#9  0x00007f75d757d7bd in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#10 0x00007f75d73ea28e in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#11 0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 9 (Thread 0x7f752fffd700 (LWP 11779)):
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f75d006f070) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f75d006f020, cond=0x7f75d006f048) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x7f75d006f048, mutex=0x7f75d006f020) at pthread_cond_wait.c:638
#3  0x00007f75d8b95e30 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00000000008d160c in netcoredbg::ManagedCallback::CallbacksWorker (this=0x7f75d006efd0) at /home/netcoredbg/src/debugger/managedcallback.cpp:139
#5  0x00000000008f7f57 in std::__invoke_impl<void, void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> (__f=@0x7f75d000bd50: (void (netcoredbg::ManagedCallback::*)(class netcoredbg::ManagedCallback * const)) 0x8d1570 <netcoredbg::ManagedCallback::CallbacksWorker()>, __t=@0x7f75d000bd48: 0x7f75d006efd0) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:73
#6  0x00000000008f7e62 in std::__invoke<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> (__fn=@0x7f75d000bd50: (void (netcoredbg::ManagedCallback::*)(class netcoredbg::ManagedCallback * const)) 0x8d1570 <netcoredbg::ManagedCallback::CallbacksWorker()>, __args=@0x7f75d000bd48: 0x7f75d006efd0) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95
#7  0x00000000008f7e25 in std::thread::_Invoker<std::tuple<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> >::_M_invoke<0ul, 1ul> (this=0x7f75d000bd48) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:244
#8  0x00000000008f7dd5 in std::thread::_Invoker<std::tuple<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> >::operator() (this=0x7f75d000bd48) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251
#9  0x00000000008f7b9e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (netcoredbg::ManagedCallback::*)(), netcoredbg::ManagedCallback*> > >::_M_run (this=0x7f75d000bd40) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195
#10 0x00007f75d8b9bde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7f75c7b5e700 (LWP 11777)):
#0  futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f75c7b5d9d0, clockid=<optimized out>, expected=0, futex_word=0x7f75d006b4f4) at ../sysdeps/nptl/futex-internal.h:320
#1  __pthread_cond_wait_common (abstime=0x7f75c7b5d9d0, clockid=<optimized out>, mutex=0x7f75d006b4a0, cond=0x7f75d006b4c8) at pthread_cond_wait.c:520
#2  __pthread_cond_timedwait (cond=0x7f75d006b4c8, mutex=0x7f75d006b4a0, abstime=0x7f75c7b5d9d0) at pthread_cond_wait.c:656
#3  0x00007f75d67f61e6 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#4  0x00007f75d67f5eb1 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#5  0x00007f75d67fa892 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#6  0x00007f75d67faba1 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#7  0x00007f75d64edd51 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#8  0x00007f75d64ede7c in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#9  0x00007f75d64696ca in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#10 0x00007f75d6469d6d in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#11 0x00007f75d64ee0a8 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#12 0x00007f75d680253e in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#13 0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7f75d4ab3700 (LWP 11776)):
--Type <RET> for more, q to quit, c to continue without paging--
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f75d005a340) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f75d005a2f0, cond=0x7f75d005a318) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x7f75d005a318, mutex=0x7f75d005a2f0) at pthread_cond_wait.c:638
#3  0x00007f75d67f61fb in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#4  0x00007f75d67f5eb1 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#5  0x00007f75d67fa892 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#6  0x00007f75d67faba1 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#7  0x00007f75d66d4969 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#8  0x00007f75d66d47d5 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#9  0x00007f75d66d451d in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#10 0x00007f75d680253e in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#11 0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f75d52b8700 (LWP 11775)):
#0  0x00007f75d8cbaad4 in __libc_open64 (file=0x7f75d004ee04 "/tmp/clr-debug-pipe-11764-129861285-in", oflag=0) at ../sysdeps/unix/sysv/linux/open64.c:48
#1  0x00007f75d66de4af in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#2  0x00007f75d66d70a8 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#3  0x00007f75d66d61c9 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#4  0x00007f75d680253e in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#5  0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f75d5add700 (LWP 11774)):
#0  0x00007f75d887b9cf in __GI___poll (fds=0x7f7558000c60, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f75d66dda2c in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#2  0x00007f75d67a49ed in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#3  0x00007f75d67a27e7 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#4  0x00007f75d680253e in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#5  0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f75d62eb700 (LWP 11773)):
#0  0x00007f75d887b9cf in __GI___poll (fds=0x7f75d62ead98, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f75d67f8540 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#2  0x00007f75d67f7ba3 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#3  0x00007f75d680253e in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libcoreclr.so
#4  0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f75d7223700 (LWP 11772)):
#0  futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f75d7222750, clockid=<optimized out>, expected=0, futex_word=0x7f75d00099e4) at ../sysdeps/nptl/futex-internal.h:320
#1  __pthread_cond_wait_common (abstime=0x7f75d7222750, clockid=<optimized out>, mutex=0x7f75d0009990, cond=0x7f75d00099b8) at pthread_cond_wait.c:520
#2  __pthread_cond_timedwait (cond=0x7f75d00099b8, mutex=0x7f75d0009990, abstime=0x7f75d7222750) at pthread_cond_wait.c:656
#3  0x00007f75d73ddf56 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#4  0x00007f75d73ddc21 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#5  0x00007f75d73e2602 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#6  0x00007f75d73e2911 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#7  0x00007f75d757adfb in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#8  0x00007f75d757b319 in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordbi.so
#9  0x00007f75d73ea28e in ?? () from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.4/libmscordaccore.so
#10 0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f75d867c700 (LWP 11765)):
#0  0x00007f75d887dffb in __GI___select (nfds=10, readfds=0x7f75d867b7c8, writefds=0x7f75d867b748, exceptfds=0x7f75d867b6c8, timeout=0x7f75d867b580) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x0000000000ab4093 in netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag>::async_wait (begin=..., end=..., timeout=...) at /home/netcoredbg/src/utils/iosystem_unix.cpp:303
#2  0x0000000000ab1680 in netcoredbg::IOSystemImpl<netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag> >::async_wait (begin=..., end=..., timeout=...) at /home/netcoredbg/src/utils/iosystem.h:159
#3  0x0000000000aad1ee in netcoredbg::IORedirectHelper::worker (this=0x1942400) at /home/netcoredbg/src/utils/ioredirect.cpp:206
#4  0x0000000000ab33e7 in std::__invoke_impl<void, void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> (__f=@0x193d370: (void (netcoredbg::IORedirectHelper::*)(class netcoredbg::IORedirectHelper * const)) 0xaac520 <netcoredbg::IORedirectHelper::worker()>, __t=@0x193d368: 0x1942400) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:73
#5  0x0000000000ab32f2 in std::__invoke<void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> (__fn=@0x193d370: (void (netcoredbg::IORedirectHelper::*)(class netcoredbg::IORedirectHelper * const)) 0xaac520 <netcoredbg::IORedirectHelper::worker()>, __args=@0x193d368: 0x1942400) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95
#6  0x0000000000ab32b5 in std::thread::_Invoker<std::tuple<void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> >::_M_invoke<0ul, 1ul> (this=0x193d368) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../incl--Type <RET> for more, q to quit, c to continue without paging--
ude/c++/9/thread:244
#7  0x0000000000ab3265 in std::thread::_Invoker<std::tuple<void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> >::operator() (this=0x193d368) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:251
#8  0x0000000000ab302e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (netcoredbg::IORedirectHelper::*)(), netcoredbg::IORedirectHelper*> > >::_M_run (this=0x193d360) at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/thread:195
#9  0x00007f75d8b9bde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007f75d8caf609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007f75d8888163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f75d8764740 (LWP 11764)):
#0  0x00007f75d887dffb in __GI___select (nfds=12, readfds=0x7ffdbce73048, writefds=0x7ffdbce72fc8, exceptfds=0x7ffdbce72f48, timeout=0x7ffdbce72e00) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x0000000000ab4093 in netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag>::async_wait (begin=..., end=..., timeout=...) at /home/netcoredbg/src/utils/iosystem_unix.cpp:303
#2  0x0000000000ab1680 in netcoredbg::IOSystemImpl<netcoredbg::IOSystemTraits<netcoredbg::UnixPlatformTag> >::async_wait (begin=..., end=..., timeout=...) at /home/netcoredbg/src/utils/iosystem.h:159
#3  0x0000000000aaf797 in netcoredbg::IORedirectHelper::async_input (this=0x1942400, in=...) at /home/netcoredbg/src/utils/ioredirect.cpp:432
#4  0x00000000008ea089 in netcoredbg::ManagedDebugger::ProcessStdin (this=0x19420a0, stream=...) at /home/netcoredbg/src/debugger/manageddebugger.cpp:1191
#5  0x00000000009a4cd1 in netcoredbg::CLIProtocol::execCommands (this=0x1941d50, lr=..., printCommands=false) at /home/netcoredbg/src/protocols/cliprotocol.cpp:2310
#6  0x00000000009a7687 in netcoredbg::CLIProtocol::CommandLoop (this=0x1941d50) at /home/netcoredbg/src/protocols/cliprotocol.cpp:2499
#7  0x0000000000a99f0b in main (argc=5, argv=0x7ffdbce74768) at /home/netcoredbg/src/main.cpp:485
viewizard commented 2 years ago

@Martin-Molinero here is new patch (make sure you revert previous patch before apply this one) 0001-Fix-callbacks-return-code-check.txt

Need more time for analyze second backtrace in https://github.com/Samsung/netcoredbg/issues/89#issuecomment-1118007059 looks like some error code from select at quit (related to closed FD in another place?) that throw std::runtime_error() (in case of Linux this will looks like send SIGABRT signal for process), that handled by CoreCLR signal handler code (our managed part use it, so, CoreCLR is part of debugger process), but CoreCLR probably already in "shutdown" process and we have SIGSEGV here... I was not able to reproduce this, even if I put into debugger codethrow std::runtime_error(), during debugging I see SIGABRT from debugger native code (not SIGSEGV from CoreCLR signal handler). Any way, this should be investigated in order to understand why we have this error on select call at quit at all.

About debugger hang: I analyze backtraces and found, that debugger code works fine - CLI protocol part waiting for input, callbacks part waiting for callback call from debuggee process. Could you please check, do you have it hang or it's just not print "prompt" and you could tape some command? Another point, at hang please wait 6+ minutes (we already faced with deadlock issues in debuggee process runtime / debug API, usually debuggee process runtime return error code 0x80131c08 - CORDBG_E_TIMEOUT after 6+ minutes).

Martin-Molinero commented 2 years ago

I can confirm I'm not seeing any seg fault mid debugging nor exception causing a break 🎉 thanks @viewizard! I still reproduced the seg fault at the end occasionally, haven't seen the 'hang' again 👍

Note: I think the managed dll being used is in a debug build image

viewizard commented 2 years ago

Note: I think the managed dll being used is in a debug build

Hmm https://github.com/Samsung/netcoredbg/blob/a8bd3b95328f19dfe5519973b8176f40d3b4f509/src/CMakeLists.txt#L207 looks like it don't care about cmake build type, will check this at work.

viewizard commented 2 years ago

Here is fix for managed part build: 0001-Fix-managed-part-build-type.txt Note, this is separate patch, that don't include previous fixed.

viewizard commented 2 years ago

Fixed in upstream now.