JuliaLang / libuv

Cross-platform asynchronous I/O
http://libuv.org/
MIT License
9 stars 14 forks source link

uv_spawn probably blocks too many signals #5

Open vtjnash opened 4 years ago

vtjnash commented 4 years ago

We briefly block all signals during fork, because unix is often really bad at handling them during this call. However, we're blocking everything, and that's perhaps too much since it means buggy OS code (probably an atfork handler) can get us wedged. For example, I recorded us getting stuck here on my 10.14.6 laptop: Darwin Jameson.local 18.7.0 Darwin Kernel Version 18.7.0: Sun Dec 1 18:59:03 PST 2019; root:xnu-4903.278.19~1/RELEASE_X86_64 x86_64

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00007fff5f80a9e2 libdispatch.dylib`_firehose_task_buffer_init + 32
    frame #1: 0x00007fff5f7ea63d libdispatch.dylib`_dispatch_client_callout + 8
    frame #2: 0x00007fff5f7ebd4b libdispatch.dylib`_dispatch_once_callout + 20
    frame #3: 0x00007fff5f808b77 libdispatch.dylib`voucher_activity_get_metadata_buffer + 100
    frame #4: 0x00007fff5fa44163 libsystem_trace.dylib`_os_trace_init_slow + 94
    frame #5: 0x00007fff5f7ea63d libdispatch.dylib`_dispatch_client_callout + 8
    frame #6: 0x00007fff5f7ebd4b libdispatch.dylib`_dispatch_once_callout + 20
    frame #7: 0x00007fff5fa45f17 libsystem_trace.dylib`os_log_type_enabled + 463
    frame #8: 0x00007fff5deddf00 libnetwork.dylib`nw_path_close_fd + 144

We should probably avoid blocking anything that shouldn't be deferred (esp. SIGILL, SIGABRT, SIGSYS, SIGSEGV, SIGFPE, etc.) although nothing we do (except perhaps avoiding fork altogether by using vfork and/or posix_spawn+VFORK) is going to make this entirely reliable.

Note: prior to this, this had been spewing messages like the following during prior (successful) uv_spawn calls (many times, but it was an earlier process):

2019-12-21 00:01:51.006900-0500 julia-debug[17119:785417] nw_path_close_fd Failed to close guarded necp fd 43 [9: Bad file descriptor]

Hopefully this info helps someone on Google, since this appears to not be an uncommon issue for the past 2 years.