dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.71k forks source link

[mono] MonoAOT-llvm x64 crash inside `Perf_Regex_Common` microbenchmarks #108180

Open matouskozak opened 3 weeks ago

matouskozak commented 3 weeks ago

I've noticed several crashes in Perf_Regex_Common microbenchmark group with common crash pattern.

Build: https://dev.azure.com/dnceng/internal/_build/results?buildId=2545451

Affected microbenchmarks:

Crash log:

[2024/09/24 00:12:00][INFO] =================================================================
[2024/09/24 00:12:00][INFO]     Native Crash Reporting
[2024/09/24 00:12:00][INFO] =================================================================
[2024/09/24 00:12:00][INFO] Got a SIGSEGV while executing native code. This usually indicates
[2024/09/24 00:12:00][INFO] a fatal error in the mono runtime or one of the native libraries
[2024/09/24 00:12:00][INFO] used by your application.
[2024/09/24 00:12:00][INFO] =================================================================
[2024/09/24 00:12:00][INFO] =================================================================
[2024/09/24 00:12:00][INFO]     Native stacktrace:
[2024/09/24 00:12:00][INFO] =================================================================
[2024/09/24 00:12:00][INFO]     0x781b84e94faf - Unknown
[2024/09/24 00:12:00][INFO]     0x781b84e34cde - Unknown
[2024/09/24 00:12:00][INFO]     0x781b84d98a41 - Unknown
[2024/09/24 00:12:00][INFO]     0x781b85a42520 - Unknown
[2024/09/24 00:12:00][INFO]     0x781b85077f7b - Unknown
[2024/09/24 00:12:00][INFO]     0x781b85063ed8 - Unknown
[2024/09/24 00:12:00][INFO]     0x781b85064ab8 - Unknown
[2024/09/24 00:12:00][INFO]     0x781b8506375c - Unknown
[2024/09/24 00:12:00][INFO]     0x781b8505f4b9 - Unknown
[2024/09/24 00:12:00][INFO]     0x781b850604ef - Unknown
[2024/09/24 00:12:00][INFO]     0x781b8503e700 - Unknown
[2024/09/24 00:12:00][INFO]     0x413637ba - Unknown
[2024/09/24 00:12:00][INFO] 
[2024/09/24 00:12:00][INFO] =================================================================
[2024/09/24 00:12:00][INFO]     External Debugger Dump:
[2024/09/24 00:12:00][INFO] =================================================================
[2024/09/24 00:12:01][INFO] [New LWP 222807]
[2024/09/24 00:12:01][INFO] [New LWP 222808]
[2024/09/24 00:12:01][INFO] [New LWP 222809]
[2024/09/24 00:12:03][INFO] [Thread debugging using libthread_db enabled]
[2024/09/24 00:12:03][INFO] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[2024/09/24 00:12:03][INFO] 0x0000781b85aea42f in __GI___wait4 (pid=pid@entry=222810, stat_loc=stat_loc@entry=0x7fff0e031380, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
[2024/09/24 00:12:03][INFO] 30  ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
[2024/09/24 00:12:03][INFO]   Id   Target Id                                           Frame
[2024/09/24 00:12:03][INFO] * 1    Thread 0x781b85d16740 (LWP 222806) "Job-XBUBRZ"     0x0000781b85aea42f in __GI___wait4 (pid=pid@entry=222810, stat_loc=stat_loc@entry=0x7fff0e031380, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
[2024/09/24 00:12:04][INFO]   2    Thread 0x781b83600640 (LWP 222807) "SGen worker"    __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x781b85968d58 <work_cond+40>) at ./nptl/futex-internal.c:57
[2024/09/24 00:12:04][INFO]   3    Thread 0x781b80e00640 (LWP 222808) ".NET EventPipe" 0x0000781b85a4280a in __GI___sigsuspend (set=set@entry=0x781b859546d0 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:26
[2024/09/24 00:12:04][INFO]   4    Thread 0x781b80a00640 (LWP 222809) "Finalizer"      __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x781b8595a2a0 <finalizer_sem>) at ./nptl/futex-internal.c:57
[2024/09/24 00:12:04][INFO] 
[2024/09/24 00:12:04][INFO] Thread 4 (Thread 0x781b80a00640 (LWP 222809) "Finalizer"):
[2024/09/24 00:12:04][INFO] #0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x781b8595a2a0 <finalizer_sem>) at ./nptl/futex-internal.c:57
[2024/09/24 00:12:04][INFO] #1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x781b8595a2a0 <finalizer_sem>) at ./nptl/futex-internal.c:87
[2024/09/24 00:12:04][INFO] #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x781b8595a2a0 <finalizer_sem>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
[2024/09/24 00:12:04][INFO] #3  0x0000781b85a9cbdf in do_futex_wait (sem=sem@entry=0x781b8595a2a0 <finalizer_sem>, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
[2024/09/24 00:12:04][INFO] #4  0x0000781b85a9cc78 in __new_sem_wait_slow64 (sem=0x781b8595a2a0 <finalizer_sem>, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
[2024/09/24 00:12:04][INFO] #5  0x0000781b85a9ccf1 in __new_sem_wait (sem=<optimized out>) at ./nptl/sem_wait.c:42
[2024/09/24 00:12:04][INFO] #6  0x0000781b8502e858 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=<optimized out>) at /__w/1/s/src/mono/mono/metadata/../../mono/utils/mono-os-semaphore.h:204
[2024/09/24 00:12:04][INFO] #7  mono_coop_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=<optimized out>) at /__w/1/s/src/mono/mono/metadata/../../mono/utils/mono-coop-semaphore.h:41
[2024/09/24 00:12:04][INFO] #8  finalizer_thread (unused=<optimized out>) at /__w/1/s/src/mono/mono/metadata/gc.c:868
[2024/09/24 00:12:04][INFO] #9  0x0000781b8500723e in start_wrapper_internal (start_info=0x0, stack_ptr=<optimized out>) at /__w/1/s/src/mono/mono/metadata/threads.c:1208
[2024/09/24 00:12:04][INFO] #10 start_wrapper (data=<optimized out>) at /__w/1/s/src/mono/mono/metadata/threads.c:1276
[2024/09/24 00:12:04][INFO] #11 0x0000781b85a94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
[2024/09/24 00:12:04][INFO] #12 0x0000781b85b26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
[2024/09/24 00:12:04][INFO] 
[2024/09/24 00:12:04][INFO] Thread 3 (Thread 0x781b80e00640 (LWP 222808) ".NET EventPipe"):
[2024/09/24 00:12:04][INFO] #0  0x0000781b85a4280a in __GI___sigsuspend (set=set@entry=0x781b859546d0 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:26
[2024/09/24 00:12:04][INFO] #1  0x0000781b84f79d03 in suspend_signal_handler (_dummy=<optimized out>, info=<optimized out>, context=0x781b80dff580) at /__w/1/s/src/mono/mono/utils/mono-threads-posix-signals.c:200
[2024/09/24 00:12:04][INFO] #2  <signal handler called>
[2024/09/24 00:12:04][INFO] #3  0x0000781b85b18bcf in __GI___poll (fds=fds@entry=0x781b7c002970, nfds=nfds@entry=1, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
[2024/09/24 00:12:04][INFO] #4  0x0000781b84f4381d in ipc_poll_fds (fds=0x781b7c002970, nfds=1, timeout=4294967295) at /__w/1/s/src/native/eventpipe/ds-ipc-pal-socket.c:470
[2024/09/24 00:12:04][INFO] #5  ds_ipc_poll (poll_handles_data=0x781b7c002760, poll_handles_data_len=poll_handles_data_len@entry=1, timeout_ms=timeout_ms@entry=4294967295, callback=callback@entry=0x781b84f49a70 <server_warning_callback>) at /__w/1/s/src/native/eventpipe/ds-ipc-pal-socket.c:1105
[2024/09/24 00:12:04][INFO] #6  0x0000781b84f434d8 in ds_ipc_stream_factory_get_next_available_stream (callback=callback@entry=0x781b84f49a70 <server_warning_callback>) at /__w/1/s/src/native/eventpipe/ds-ipc.c:393
[2024/09/24 00:12:04][INFO] #7  0x0000781b84f473cb in server_thread (data=<optimized out>) at /__w/1/s/src/native/eventpipe/ds-server.c:129
[2024/09/24 00:12:04][INFO] #8  0x0000781b84f49a51 in ep_rt_thread_mono_start_func (data=0x640c9a6d7320) at /__w/1/s/src/mono/mono/mini/../../mono/eventpipe/ep-rt-mono.h:878
[2024/09/24 00:12:04][INFO] #9  0x0000781b85a94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
[2024/09/24 00:12:04][INFO] #10 0x0000781b85b26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
[2024/09/24 00:12:04][INFO] 
[2024/09/24 00:12:04][INFO] Thread 2 (Thread 0x781b83600640 (LWP 222807) "SGen worker"):
[2024/09/24 00:12:04][INFO] #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x781b85968d58 <work_cond+40>) at ./nptl/futex-internal.c:57
[2024/09/24 00:12:04][INFO] #1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x781b85968d58 <work_cond+40>) at ./nptl/futex-internal.c:87
[2024/09/24 00:12:04][INFO] #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x781b85968d58 <work_cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
[2024/09/24 00:12:04][INFO] #3  0x0000781b85a93a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x781b85968d08 <lock>, cond=0x781b85968d30 <work_cond>) at ./nptl/pthread_cond_wait.c:503
[2024/09/24 00:12:04][INFO] #4  ___pthread_cond_wait (cond=0x781b85968d30 <work_cond>, mutex=0x781b85968d08 <lock>) at ./nptl/pthread_cond_wait.c:627
[2024/09/24 00:12:04][INFO] #5  0x0000781b8509ca83 in mono_os_cond_wait (cond=<optimized out>, mutex=<optimized out>) at /__w/1/s/src/mono/mono/sgen/../../mono/utils/mono-os-mutex.h:219
[2024/09/24 00:12:04][INFO] #6  get_work (worker_index=0, work_context=<optimized out>, do_idle=<optimized out>, job=<optimized out>) at /__w/1/s/src/mono/mono/sgen/sgen-thread-pool.c:164
[2024/09/24 00:12:04][INFO] #7  thread_func (data=0x0) at /__w/1/s/src/mono/mono/sgen/sgen-thread-pool.c:195
[2024/09/24 00:12:04][INFO] #8  0x0000781b85a94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
[2024/09/24 00:12:04][INFO] #9  0x0000781b85b26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
[2024/09/24 00:12:04][INFO] 
[2024/09/24 00:12:04][INFO] Thread 1 (Thread 0x781b85d16740 (LWP 222806) "Job-XBUBRZ"):
[2024/09/24 00:12:04][INFO] #0  0x0000781b85aea42f in __GI___wait4 (pid=pid@entry=222810, stat_loc=stat_loc@entry=0x7fff0e031380, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
[2024/09/24 00:12:04][INFO] #1  0x0000781b85aea3ab in __GI___waitpid (pid=pid@entry=222810, stat_loc=stat_loc@entry=0x7fff0e031380, options=options@entry=0) at ./posix/waitpid.c:38
[2024/09/24 00:12:04][INFO] #2  0x0000781b84e950e0 in dump_native_stacktrace (signal=<optimized out>, mctx=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:846
[2024/09/24 00:12:04][INFO] #3  mono_dump_native_crash_info (signal=<optimized out>, mctx=mctx@entry=0x7fff0e031ef0, info=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:868
[2024/09/24 00:12:04][INFO] #4  0x0000781b84e34cde in mono_handle_native_crash (signal=0x781b84ae3c80 "SIGSEGV", mctx=0x7fff0e031ef0, info=0x7fff0e0321b0) at /__w/1/s/src/mono/mono/mini/mini-exceptions.c:2967
[2024/09/24 00:12:04][INFO] #5  0x0000781b84d98a41 in mono_sigsegv_signal_handler_debug (_dummy=11, _info=0x7fff0e0321b0, context=0x7fff0e032080, debug_fault_addr=0x204000000) at /__w/1/s/src/mono/mono/mini/mini-runtime.c:3908
[2024/09/24 00:12:04][INFO] #6  <signal handler called>
[2024/09/24 00:12:04][INFO] #7  0x0000781b85077f7b in major_copy_or_mark_object_no_evacuation (ptr=0x781b6f5ef1a0, obj=0x204000000, queue=0x7fff0e032940) at /__w/1/s/src/mono/mono/sgen/../../mono/sgen/sgen-gc.h:206
[2024/09/24 00:12:04][INFO] #8  major_scan_object_no_evacuation (full_object=0x781b6f5ef130, desc=<optimized out>, queue=0x7fff0e032940) at /__w/1/s/src/mono/mono/sgen/sgen-scan-object.h:66
[2024/09/24 00:12:04][INFO] #9  drain_gray_stack_no_evacuation (queue=0x7fff0e032940) at /__w/1/s/src/mono/mono/sgen/sgen-marksweep-drain-gray-stack.h:347
[2024/09/24 00:12:04][INFO] #10 drain_gray_stack (queue=0x7fff0e032940) at /__w/1/s/src/mono/mono/sgen/sgen-marksweep.c:1288
[2024/09/24 00:12:04][INFO] #11 0x0000781b85063ed8 in sgen_drain_gray_stack (ctx=...) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:578
[2024/09/24 00:12:04][INFO] #12 finish_gray_stack (generation=generation@entry=1, ctx=...) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:1140
[2024/09/24 00:12:04][INFO] #13 0x0000781b85064ab8 in major_finish_collection (gc_thread_gray_queue=gc_thread_gray_queue@entry=0x7fff0e032940, reason=reason@entry=0x781b84ad583e "user request", is_overflow=is_overflow@entry=0, old_next_pin_slot=84, forced=forced@entry=1) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:2323
[2024/09/24 00:12:04][INFO] #14 0x0000781b8506375c in major_do_collection (reason=reason@entry=0x781b84ad583e "user request", is_overflow=is_overflow@entry=0, forced=forced@entry=1) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:2465
[2024/09/24 00:12:04][INFO] #15 0x0000781b8505f4b9 in sgen_perform_collection_inner (requested_size=<optimized out>, generation_to_collect=<optimized out>, reason=<optimized out>, forced_serial=<optimized out>, stw=<optimized out>) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:2665
[2024/09/24 00:12:04][INFO] #16 sgen_perform_collection (requested_size=requested_size@entry=0, generation_to_collect=1, reason=0x781b84ad583e "user request", forced_serial=forced_serial@entry=1, stw=stw@entry=1) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:2762
[2024/09/24 00:12:04][INFO] #17 0x0000781b850604ef in sgen_gc_collect (generation=67108864, generation@entry=1) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:3228
[2024/09/24 00:12:04][INFO] #18 0x0000781b8503e700 in mono_gc_collect (generation=1) at /__w/1/s/src/mono/mono/metadata/sgen-mono.c:2359
[2024/09/24 00:12:04][INFO] #19 0x00000000413637ba in ?? ()
[2024/09/24 00:12:04][INFO] #20 0x0000000000000001 in ?? ()
[2024/09/24 00:12:04][INFO] #21 0x0000781b83810fb0 in ?? ()
[2024/09/24 00:12:04][INFO] #22 0x0000000000000001 in ?? ()
[2024/09/24 00:12:04][INFO] #23 0x0000781b8386b618 in ?? ()
[2024/09/24 00:12:04][INFO] #24 0x0000000000000001 in ?? ()
[2024/09/24 00:12:04][INFO] #25 0x00007fff0e033548 in ?? ()
[2024/09/24 00:12:04][INFO] #26 0x0000000000000000 in ?? ()
[2024/09/24 00:12:04][INFO] [Inferior 1 (process 222806) detached]
[2024/09/24 00:12:04][INFO] =================================================================
[2024/09/24 00:12:04][INFO]     Basic Fault Address Reporting
[2024/09/24 00:12:04][INFO] =================================================================
[2024/09/24 00:12:04][INFO] Memory around native instruction pointer (0x781b85077f7b):0x781b85077f6b  48 c7 c2 ff ff ff ff 48 d3 e2 48 89 d6 48 21 ee  H......H..H..H!.
[2024/09/24 00:12:04][INFO] 0x781b85077f7b  48 8b 45 00 49 3b 32 75 3c a8 02 0f 85 c4 01 00  H.E.I;2u<.......
[2024/09/24 00:12:04][INFO] 0x781b85077f8b  00 a8 01 0f 84 3f 01 00 00 48 83 e0 f8 0f 84 35  .....?...H.....5
[2024/09/24 00:12:04][INFO] 0x781b85077f9b  01 00 00 49 89 45 00 41 0f b6 09 48 c7 c2 ff ff  ...I.E.A...H....
[2024/09/24 00:12:04][INFO] =================================================================
[2024/09/24 00:12:04][INFO]     Managed Stacktrace:
[2024/09/24 00:12:04][INFO] =================================================================
[2024/09/24 00:12:04][INFO]       at <unknown> <0xffffffff>
[2024/09/24 00:12:04][INFO]       at System.GC:InternalCollect <0x00079>
[2024/09/24 00:12:04][INFO]       at System.GC:Collect <0x00024>
[2024/09/24 00:12:04][INFO]       at BenchmarkDotNet.Engines.Engine:RunIteration <0x000b5>
[2024/09/24 00:12:04][INFO]       at BenchmarkDotNet.Engines.EngineFactory:Jit <0x0009c>
[2024/09/24 00:12:04][INFO]       at BenchmarkDotNet.Engines.EngineFactory:CreateReadyToRun <0x0016b>
[2024/09/24 00:12:04][INFO]       at BenchmarkDotNet.Autogenerated.Runnable_327:Run <0x00c7f>
[2024/09/24 00:12:04][INFO]       at <Module>:runtime_invoke_void_object_object <0x00095>
[2024/09/24 00:12:04][INFO]       at <unknown> <0xffffffff>
[2024/09/24 00:12:04][INFO]       at System.Reflection.RuntimeMethodInfo:InternalInvoke <0x000b9>
[2024/09/24 00:12:04][INFO]       at System.Reflection.MethodBaseInvoker:InterpretedInvoke_Method <0x0003d>
[2024/09/24 00:12:04][INFO]       at System.Reflection.MethodBaseInvoker:InvokeDirectByRefWithFewArgs <0x001b2>
[2024/09/24 00:12:04][INFO]       at System.Reflection.MethodBaseInvoker:InvokeWithFewArgs <0x0080f>
[2024/09/24 00:12:04][INFO]       at System.Reflection.RuntimeMethodInfo:Invoke <0x001a4>
[2024/09/24 00:12:04][INFO]       at System.Reflection.MethodBase:Invoke <0x00020>
[2024/09/24 00:12:04][INFO]       at BenchmarkDotNet.Autogenerated.UniqueProgramName:AfterAssemblyLoadingAttached <0x005c4>
[2024/09/24 00:12:04][INFO]       at BenchmarkDotNet.Autogenerated.UniqueProgramName:Main <0x00012>
[2024/09/24 00:12:04][INFO]       at <Module>:runtime_invoke_int_object <0x00091>
[2024/09/24 00:12:04][INFO] =================================================================
dotnet-policy-service[bot] commented 3 weeks ago

Tagging subscribers to this area: @brzvlad See info in area-owners.md if you want to be subscribed.

matouskozak commented 3 weeks ago

Looking at the managed stack trace, this issue could be possibly related to GC. What do you think @BrzVlad ?

I also see ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory. in the crash, wonder if something related to threading could be playing part here.

BrzVlad commented 3 weeks ago

This is definitely GC crash, related to a live object being collected. I'll try to reproduce in the future.

vitek-karas commented 3 weeks ago

@matouskozak is this something new? Or it's possible we've had this previously and didn't see it?

matouskozak commented 3 weeks ago

@matouskozak is this something new? Or it's possible we've had this previously and didn't see it?

Not sure, we had a long standing AOT failures https://github.com/dotnet/runtime/issues/106914 that were taking a large portion of failures. These got fixed recently so it's possible that we just didn't notice the failures before.