dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.27k stars 1.58k forks source link

Deadlock on MacOS during fork() due to memory allocation in child #35157

Open sstrickl opened 6 years ago

sstrickl commented 6 years ago

From the stdout logs:

-----

FAILED: dartk-vm release_simdbc64 service/next_through_operator_bracket_on_super_test
Expected: Pass
Actual: Timeout

--- Command "vm_compile_to_kernel" (took 05.000246s):
DART_CONFIGURATION=ReleaseSIMDBC64 /b/s/w/ir/pkg/vm/tool/gen_kernel --no-aot --platform=xcodebuild/ReleaseSIMDBC64/vm_platform_strong.dill -o /b/s/w/ir/xcodebuild/ReleaseSIMDBC64/generated_compilations/dartk/runtime_observatory_tests_service_next_through_operator_bracket_on_super_test/out.dill /b/s/w/ir/runtime/observatory/tests/service/next_through_operator_bracket_on_super_test.dart --packages=/b/s/w/ir/.packages -Ddart.developer.causal_async_stacks=true

exit code:
0

--- Command "vm" (took 04:38.000691s):
DART_CONFIGURATION=ReleaseSIMDBC64 xcodebuild/ReleaseSIMDBC64/dart --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/xcodebuild/ReleaseSIMDBC64/generated_compilations/dartk/runtime_observatory_tests_service_next_through_operator_bracket_on_super_test/out.dill

exit code:
-15

diagnostics:
Process list including children: [10173, 10174]
Trying to capture stack trace for pid 10173
... < stack trace dumps, see linked log> ...

stdout:
** Launching SERVICE_TEST_TESTEE=true xcodebuild/ReleaseSIMDBC64/dart -DDART_SERVICE_USE_AUTH=false --pause-isolates-on-start --pause-isolates-on-exit --profiler --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages --enable-vm-service:0 /b/s/w/ir/xcodebuild/ReleaseSIMDBC64/generated_compilations/dartk/runtime_observatory_tests_service_next_through_operator_bracket_on_super_test/out.dill

--- Re-run this test:
python tools/test.py -m release -a simdbc64 -n dartk-mac-release-simdbc64 --write-logs service/next_through_operator_bracket_on_super_test

===
=== 1 test failed
===

--- Total time: 05:03 ---
0:04:38.691460 - vm - dartk-vm release_simdbc64/service/next_through_operator_bracket_on_super_test
0:00:30.262702 - vm - dartk-vm release_simdbc64/standalone_2/io/skipping_dart2js_compilations_test
0:00:17.568748 - vm_compile_to_kernel - dartk-vm release_simdbc64/service/break_on_activation_test
... < other previously passed tests > ...
dcharkes commented 6 years ago

This test takes 5 seconds on my MBP, so 5 minutes should be more than enough.

100 repeats do not reproduce this:

python tools/test.py -m release -a simdbc64 -n dartk-mac-release-simdbc64 --write-logs --repeat 100 service/next_through_operator_bracket_on_super_test
Test configuration:
    dartk-mac-release-simdbc64(architecture: simdbc64, compiler: dartk, mode: release, runtime: vm, system: mac, vm-options: [], timeout: null, preview-dart-2)
Suites tested: service
[02:45 | 100% | +  100 | -    0]

Looking at the call graph there is 1 thread not waiting:

    + 1 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fff5bd01bf9]
    +   1 _pthread_start  (in libsystem_pthread.dylib) + 377  [0x7fff5bd0250d]
    +     1 _pthread_body  (in libsystem_pthread.dylib) + 340  [0x7fff5bd02661]
    +       1 dart::ThreadStart(void*)  (in dart) + 71  [0x1023801a7]
    +         1 dart::ThreadPool::Worker::Main(unsigned long)  (in dart) + 136  [0x10241b618]
    +           1 dart::ThreadPool::Worker::Loop()  (in dart) + 61  [0x10241b78d]
    +             1 dart::MessageHandler::TaskCallback()  (in dart) + 534  [0x1022e5e66]
    +               1 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)  (in dart) + 218  [0x1022e57fa]
    +                 1 dart::IsolateMessageHandler::HandleMessage(dart::Message*)  (in dart) + 1313  [0x1022b5ea1]
    +                   1 dart::DartLibraryCalls::HandleMessage(dart::Object const&, dart::Instance const&)  (in dart) + 742  [0x1022857b6]
    +                     1 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)  (in dart) + 972  [0x10228281c]
    +                       1 dart::Simulator::Call(dart::Code const&, dart::Array const&, dart::Array const&, dart::Thread*)  (in dart) + 8294  [0x1023fcc36]
    +                         1 dart::InvokeNative(dart::Thread*, dart::Simulator*, void (*)(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*)), void (*)(_Dart_NativeArguments*), _Dart_NativeArguments*)  (in dart) + 90  [0x10240198a]
    +                           1 dart::NativeEntry::AutoScopeNativeCallWrapperNoStackCheck(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*))  (in dart) + 135  [0x1022e7dc7]
    +                             1 dart::bin::Builtin_Process_Start(_Dart_NativeArguments*)  (in dart) + 646  [0x102074af6]
    +                               1 dart::bin::Process::Start(dart::bin::Namespace*, char const*, char**, long, char const*, char**, long, dart::bin::ProcessStartMode, long*, long*, long*, long*, long*, char**)  (in dart) + 504  [0x1020756a8]
    +                                 1 dart::bin::ProcessStarter::Start()  (in dart) + 451  [0x102075883]
    +                                   1 dart::bin::FDUtils::ReadFromBlocking(int, void*, unsigned long)  (in dart) + 94  [0x102089a9e]
    +                                     1 read  (in libsystem_kernel.dylib) + 10  [0x7fff5bb3c142]

This process is waiting for a subprocess:

    // Notify child process to start. This is done to delay the call to exec
    // until the process is registered above, and we are ready to receive the
    // exit code.
    char msg = '1';
    int bytes_written =
        FDUtils::WriteToBlocking(read_in_[1], &msg, sizeof(msg));

The forked process:

                                      1 dart::bin::ProcessStarter::Start()  (in dart) + 85  [0x102075715]
                                        1 fork  (in libsystem_c.dylib) + 47  [0x7fff5ba49683]
                                          1 libSystem_atfork_child  (in libSystem.B.dylib) + 39  [0x7fff59685b09]
                                            1 _os_once  (in libsystem_platform.dylib) + 33  [0x7fff5bcf7f00]
                                              1 _dirhelper_init  (in libsystem_coreservices.dylib) + 74  [0x7fff5bac8cb7]
                                                1 calloc  (in libsystem_malloc.dylib) + 30  [0x7fff5bb9558e]
                                                  1 malloc_zone_calloc  (in libsystem_malloc.dylib) + 87  [0x7fff5bb94c7c]
                                                    1 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 1743  [0x7fff5bb938e8]
                                                      1 syscall_thread_switch  (in libsystem_kernel.dylib) + 10  [0x7fff5bb3131e]

@mkustermann thinks that if the original process had a lock, the new process can never release that, so it deadlocks.

mkustermann commented 6 years ago

It does suspiciously look like after forking the MacOS implementation of fork() tries to allocate memory and deadlocks, possibly because another thread on the spawning process had the lock held at forking time and in the subprocess that thread is gone and will not free it.

Normally these things should be handled by correctly using pthread_atfork. Though it's possible that the macos implementation does not call it.

One possible mitigation would be using posix_spawn instead of fork+exec, but I'm not sure if the underlying implementation is not based on fork+exec as well.

/cc @mraleph

mraleph commented 5 years ago

Another failure with a similar stack trace:

    1 Thread_243660: Main Thread   DispatchQueue_<multiple>
      1 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fff70818bf9]
        1 _pthread_start  (in libsystem_pthread.dylib) + 377  [0x7fff7081950d]
          1 _pthread_body  (in libsystem_pthread.dylib) + 340  [0x7fff70819661]
            1 dart::ThreadStart(void*)  (in dart) + 71  [0x10c25d7a7]
              1 dart::ThreadPool::Worker::Main(unsigned long)  (in dart) + 136  [0x10c2f9038]
                1 dart::ThreadPool::Worker::Loop()  (in dart) + 61  [0x10c2f91ad]
                  1 dart::MessageHandler::TaskCallback()  (in dart) + 534  [0x10c1c3886]
                    1 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)  (in dart) + 218  [0x10c1c31ea]
                      1 dart::IsolateMessageHandler::HandleMessage(dart::Message*)  (in dart) + 1313  [0x10c193751]
                        1 dart::DartLibraryCalls::HandleMessage(dart::Object const&, dart::Instance const&)  (in dart) + 742  [0x10c162d36]
                          1 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)  (in dart) + 959  [0x10c15fd8f]
                            1 dart::Simulator::Call(dart::Code const&, dart::Array const&, dart::Array const&, dart::Thread*)  (in dart) + 8326  [0x10c2da296]
                              1 dart::InvokeNative(dart::Thread*, dart::Simulator*, void (*)(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*)), void (*)(_Dart_NativeArguments*), _Dart_NativeArguments*)  (in dart) + 90  [0x10c2df01a]
                                1 dart::NativeEntry::AutoScopeNativeCallWrapperNoStackCheck(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*))  (in dart) + 135  [0x10c1c57e7]
                                  1 dart::bin::Builtin_Process_Start(_Dart_NativeArguments*)  (in dart) + 646  [0x10bf50d26]
                                    1 dart::bin::Process::Start(dart::bin::Namespace*, char const*, char**, long, char const*, char**, long, dart::bin::ProcessStartMode, long*, long*, long*, long*, long*, char**)  (in dart) + 504  [0x10bf518d8]
                                      1 dart::bin::ProcessStarter::Start()  (in dart) + 85  [0x10bf51945]
                                        1 fork  (in libsystem_c.dylib) + 47  [0x7fff70560683]
                                          1 libSystem_atfork_child  (in libSystem.B.dylib) + 39  [0x7fff6e19cb09]
                                            1 _os_once  (in libsystem_platform.dylib) + 33  [0x7fff7080ef00]
                                              1 _dirhelper_init  (in libsystem_coreservices.dylib) + 74  [0x7fff705dfcb7]
                                                1 calloc  (in libsystem_malloc.dylib) + 30  [0x7fff706ac58e]
                                                  1 malloc_zone_calloc  (in libsystem_malloc.dylib) + 87  [0x7fff706abc7c]
                                                    1 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 1743  [0x7fff706aa8e8]
                                                      1 syscall_thread_switch  (in libsystem_kernel.dylib) + 10  [0x7fff7064831e]

Log file here:

https://logs.chromium.org/logs/dart/buildbucket/cr-buildbucket.appspot.com/8928758433485074512/+/steps/vm_tests_shard_9/0/stdout

sstrickl commented 5 years ago

I also saw a similar issue yesterday, which I realized after you mentioned this today:

Call graph:
    1 Thread_326667: Main Thread   DispatchQueue_<multiple>
      1 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fff5911ebf9]
        1 _pthread_start  (in libsystem_pthread.dylib) + 377  [0x7fff5911f50d]
          1 _pthread_body  (in libsystem_pthread.dylib) + 340  [0x7fff5911f661]
            1 dart::ThreadStart(void*)  (in dart) + 71  [0x10b1585e7]
              1 dart::ThreadPool::Worker::Main(unsigned long)  (in dart) + 136  [0x10b1f3e78]
                1 dart::ThreadPool::Worker::Loop()  (in dart) + 61  [0x10b1f3fed]
                  1 dart::MessageHandler::TaskCallback()  (in dart) + 534  [0x10b0be6c6]
                    1 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)  (in dart) + 218  [0x10b0be02a]
                      1 dart::IsolateMessageHandler::HandleMessage(dart::Message*)  (in dart) + 1313  [0x10b08e591]
                        1 dart::DartLibraryCalls::HandleMessage(dart::Object const&, dart::Instance const&)  (in dart) + 742  [0x10b05db76]
                          1 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)  (in dart) + 959  [0x10b05abcf]
                            1 dart::Simulator::Call(dart::Code const&, dart::Array const&, dart::Array const&, dart::Thread*)  (in dart) + 8326  [0x10b1d50d6]
                              1 dart::InvokeNative(dart::Thread*, dart::Simulator*, void (*)(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*)), void (*)(_Dart_NativeArguments*), _Dart_NativeArguments*)  (in dart) + 90  [0x10b1d9e5a]
                                1 dart::NativeEntry::AutoScopeNativeCallWrapperNoStackCheck(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*))  (in dart) + 135  [0x10b0c0627]
                                  1 dart::bin::Builtin_Process_Start(_Dart_NativeArguments*)  (in dart) + 646  [0x10ae4bae6]
                                    1 dart::bin::Process::Start(dart::bin::Namespace*, char const*, char**, long, char const*, char**, long, dart::bin::ProcessStartMode, long*, long*, long*, long*, long*, char**)  (in dart) + 504  [0x10ae4c698]
                                      1 dart::bin::ProcessStarter::Start()  (in dart) + 85  [0x10ae4c705]
                                        1 fork  (in libsystem_c.dylib) + 47  [0x7fff58e66683]
                                          1 libSystem_atfork_child  (in libSystem.B.dylib) + 39  [0x7fff56aa2b09]
                                            1 _os_once  (in libsystem_platform.dylib) + 33  [0x7fff59114f00]
                                              1 _dirhelper_init  (in libsystem_coreservices.dylib) + 74  [0x7fff58ee5cb7]
                                                1 calloc  (in libsystem_malloc.dylib) + 30  [0x7fff58fb258e]
                                                  1 malloc_zone_calloc  (in libsystem_malloc.dylib) + 87  [0x7fff58fb1c7c]
                                                    1 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 1743  [0x7fff58fb08e8]
                                                      1 syscall_thread_switch  (in libsystem_kernel.dylib) + 10  [0x7fff58f4e31e]

https://logs.chromium.org/logs/dart/buildbucket/cr-buildbucket.appspot.com/8928800704855660816/+/steps/vm_co19_tests_shard_7/0/stdout

sstrickl commented 5 years ago

Another instance of this happening with a different test:

Call graph:
    1 Thread_751754: Main Thread   DispatchQueue_<multiple>
      1 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fff7f867bf9]
        1 _pthread_start  (in libsystem_pthread.dylib) + 377  [0x7fff7f86850d]
          1 _pthread_body  (in libsystem_pthread.dylib) + 340  [0x7fff7f868661]
            1 dart::ThreadStart(void*)  (in dart) + 71  [0x10db0e1c7]
              1 dart::ThreadPool::Worker::Main(unsigned long)  (in dart) + 135  [0x10dba86a7]
                1 dart::ThreadPool::Worker::Loop()  (in dart) + 47  [0x10dba881f]
                  1 dart::MessageHandler::TaskCallback()  (in dart) + 534  [0x10da7ed56]
                    1 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)  (in dart) + 222  [0x10da7e6de]
                      1 dart::IsolateMessageHandler::HandleMessage(dart::Message*)  (in dart) + 1307  [0x10da4ed9b]
                        1 dart::DartLibraryCalls::HandleMessage(dart::Object const&, dart::Instance const&)  (in dart) + 735  [0x10da205bf]
                          1 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)  (in dart) + 957  [0x10da1d60d]
                            1 dart::Simulator::Call(dart::Code const&, dart::Array const&, dart::Array const&, dart::Thread*)  (in dart) + 8422  [0x10db89296]
                              1 dart::InvokeNative(dart::Thread*, dart::Simulator*, void (*)(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*)), void (*)(_Dart_NativeArguments*), _Dart_NativeArguments*)  (in dart) + 87  [0x10db8e3d7]
                                1 dart::NativeEntry::AutoScopeNativeCallWrapperNoStackCheck(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*))  (in dart) + 135  [0x10da80c17]
                                  1 dart::bin::Builtin_Process_Start(_Dart_NativeArguments*)  (in dart) + 646  [0x10d819106]
                                    1 dart::bin::Process::Start(dart::bin::Namespace*, char const*, char**, long, char const*, char**, long, dart::bin::ProcessStartMode, long*, long*, long*, long*, long*, char**)  (in dart) + 504  [0x10d819ca8]
                                      1 dart::bin::ProcessStarter::Start()  (in dart) + 85  [0x10d819d15]
                                        1 fork  (in libsystem_c.dylib) + 47  [0x7fff7f5af683]
                                          1 libSystem_atfork_child  (in libSystem.B.dylib) + 39  [0x7fff7d1ebb09]
                                            1 _os_once  (in libsystem_platform.dylib) + 33  [0x7fff7f85df00]
                                              1 _dirhelper_init  (in libsystem_coreservices.dylib) + 74  [0x7fff7f62ecb7]
                                                1 calloc  (in libsystem_malloc.dylib) + 30  [0x7fff7f6fb58e]
                                                  1 malloc_zone_calloc  (in libsystem_malloc.dylib) + 87  [0x7fff7f6fac7c]
                                                    1 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 1743  [0x7fff7f6f98e8]
                                                      1 syscall_thread_switch  (in libsystem_kernel.dylib) + 10  [0x7fff7f69731e]

Logs here