dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.27k stars 1.58k forks source link

Flaky crashes in vm/dart/trigger_gc_in_native_test on dartk-reload-rollback-linux-release-x64 #37339

Open sstrickl opened 5 years ago

sstrickl commented 5 years ago

We're seeing flaky crashes on vm/dart/trigger_gc_in_native_test on one of the reload/rollback bots when the test is run with both the --no_concurrent_mark and --use_compactor flags. It started showing up in the ignored flaky test failure logs starting at commit da8cb470cc9 and showed up there consistently until commit b32d196f. I'm not sure whether the latter commit fixed the issue that was causing the crash, though, or if it's a still-existing issue that's just masked for some reason now.

Failures from example logs:

  | vm/dart/trigger_gc_in_native_test/2 failed again (Crash, expected Pass) |
  \=========================================================================/
  --- Command "vm" (took 04.000131s):
  DART_CONFIGURATION=ReleaseX64 out/ReleaseX64/dart --hot-reload-rollback-test-mode --no_concurrent_mark --use_compactor --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/runtime/tests/vm/dart/trigger_gc_in_native_test.dart
  exit code:
  -6
  stderr:
  ===== CRASH =====
  si_signo=Segmentation fault(11), si_code=1, si_addr=0x7f7e1a3bfed0
  version=2.3.3-edge.639cff0ec02b25516103294f45828a6cb52fc37c (Fri Jun 21 17:07:57 2019 +0000) on "linux_x64"
  thread=13130, isolate=main(0x561a933fe100)
    pc 0x0000561a90cf60dc fp 0x00007f7e19afdb90 dart::Library::InitClassDictionary() const
    pc 0x0000561a90cba3af fp 0x00007f7e19afdbf0 dart::Library::NewLibraryHelper(dart::String const&, bool)
    pc 0x0000561a90c94981 fp 0x00007f7e19afdc30 dart::kernel::KernelLoader::LookupLibrary(dart::kernel::NameIndex)
    pc 0x0000561a90c998dd fp 0x00007f7e19afdd50 dart::kernel::KernelLoader::LoadLibrary(long)
    pc 0x0000561a90c9695b fp 0x00007f7e19afdea0 dart::kernel::KernelLoader::LoadProgram(bool)
    pc 0x0000561a90c960a0 fp 0x00007f7e19afe340 dart::kernel::KernelLoader::LoadEntireProgram(dart::kernel::Program*, bool)
    pc 0x0000561a90c7d704 fp 0x00007f7e19afe4c0 dart::IsolateReloadContext::Reload(bool, char const*, char const*, unsigned char const*, long)
    pc 0x0000561a90c74ce6 fp 0x00007f7e19afe520 dart::Isolate::ReloadSources(dart::JSONStream*, bool, char const*, char const*, bool)
    pc 0x0000561a90da3e30 fp 0x00007f7e19afe720 dart::DRT_StackOverflow(dart::NativeArguments)
    pc 0x00007f7e1eb01108 fp 0x00007f7e19afe760 Unknown symbol
    pc 0x00007f7e1ade0168 fp 0x00007f7e19afe7a0 Unknown symbol
    pc 0x00007f7e1adc2b23 fp 0x00007f7e19afe7e8 Unknown symbol
    pc 0x00007f7e1adf03c9 fp 0x00007f7e19afe830 Unknown symbol
    pc 0x00007f7e1adf01d0 fp 0x00007f7e19afe870 Unknown symbol
    pc 0x00007f7e1ade979c fp 0x00007f7e19afe8a8 Unknown symbol
    pc 0x00007f7e1adef304 fp 0x00007f7e19afe8f0 Unknown symbol
    pc 0x00007f7e1adca5bf fp 0x00007f7e19afe930 Unknown symbol
    pc 0x00007f7e1adeefd3 fp 0x00007f7e19afe968 Unknown symbol
    pc 0x00007f7e1eb0166c fp 0x00007f7e19afe9d8 Unknown symbol
    pc 0x0000561a90c34556 fp 0x00007f7e19afea80 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)
    pc 0x0000561a90c37a06 fp 0x00007f7e19afeae0 dart::DartLibraryCalls::HandleMessage(dart::Object const&, dart::Instance const&)
    pc 0x0000561a90c71a02 fp 0x00007f7e19afece0 dart::IsolateMessageHandler::HandleMessage(std::__2::unique_ptr<dart::Message, std::__2::default_delete<dart::Message> >)
    pc 0x0000561a90ca522d fp 0x00007f7e19afed50 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)
    pc 0x0000561a90ca59b6 fp 0x00007f7e19afedb0 dart::MessageHandler::TaskCallback()
    pc 0x0000561a90ddc247 fp 0x00007f7e19afede0 dart::ThreadPool::Worker::Loop()
    pc 0x0000561a90ddc0e5 fp 0x00007f7e19afee20 dart::ThreadPool::Worker::Main(unsigned long)
    pc 0x0000561a90d446c9 fp 0x00007f7e19afeed0 out/ReleaseX64/dart+0x17fe6c9
  -- End of DumpStackTrace
  --- Re-run this test:
  python tools/test.py -n dartk-reload-rollback-linux-release-x64 vm/dart/trigger_gc_in_native_test/2
  /=========================================================================\
  | vm/dart/trigger_gc_in_native_test/3 failed again (Crash, expected Pass) |
  \=========================================================================/
  --- Command "vm" (took 04.000129s):
  DART_CONFIGURATION=ReleaseX64 out/ReleaseX64/dart --hot-reload-rollback-test-mode --no_concurrent_mark --use_compactor --force_evacuation --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/runtime/tests/vm/dart/trigger_gc_in_native_test.dart
  exit code:
  -6
  stderr:
  ===== CRASH =====
  si_signo=Segmentation fault(11), si_code=1, si_addr=0x7f675f97fed0
  version=2.3.3-edge.639cff0ec02b25516103294f45828a6cb52fc37c (Fri Jun 21 17:07:57 2019 +0000) on "linux_x64"
  thread=13138, isolate=main(0x55dc2c140100)
    pc 0x000055dc2a6720dc fp 0x00007f675cdbdb90 dart::Library::InitClassDictionary() const
    pc 0x000055dc2a6363af fp 0x00007f675cdbdbf0 dart::Library::NewLibraryHelper(dart::String const&, bool)
    pc 0x000055dc2a610981 fp 0x00007f675cdbdc30 dart::kernel::KernelLoader::LookupLibrary(dart::kernel::NameIndex)
    pc 0x000055dc2a6158dd fp 0x00007f675cdbdd50 dart::kernel::KernelLoader::LoadLibrary(long)
    pc 0x000055dc2a61295b fp 0x00007f675cdbdea0 dart::kernel::KernelLoader::LoadProgram(bool)
    pc 0x000055dc2a6120a0 fp 0x00007f675cdbe340 dart::kernel::KernelLoader::LoadEntireProgram(dart::kernel::Program*, bool)
    pc 0x000055dc2a5f9704 fp 0x00007f675cdbe4c0 dart::IsolateReloadContext::Reload(bool, char const*, char const*, unsigned char const*, long)
    pc 0x000055dc2a5f0ce6 fp 0x00007f675cdbe520 dart::Isolate::ReloadSources(dart::JSONStream*, bool, char const*, char const*, bool)
    pc 0x000055dc2a71fe30 fp 0x00007f675cdbe720 dart::DRT_StackOverflow(dart::NativeArguments)
    pc 0x00007f6764801108 fp 0x00007f675cdbe760 Unknown symbol
    pc 0x00007f67640a0168 fp 0x00007f675cdbe7a0 Unknown symbol
    pc 0x00007f6764082b23 fp 0x00007f675cdbe7e8 Unknown symbol
    pc 0x00007f67640b03c9 fp 0x00007f675cdbe830 Unknown symbol
    pc 0x00007f67640b01d0 fp 0x00007f675cdbe870 Unknown symbol
    pc 0x00007f67640a979c fp 0x00007f675cdbe8a8 Unknown symbol
    pc 0x00007f67640af304 fp 0x00007f675cdbe8f0 Unknown symbol
    pc 0x00007f676408a5bf fp 0x00007f675cdbe930 Unknown symbol
    pc 0x00007f67640aefd3 fp 0x00007f675cdbe968 Unknown symbol
    pc 0x00007f676480166c fp 0x00007f675cdbe9d8 Unknown symbol
    pc 0x000055dc2a5b0556 fp 0x00007f675cdbea80 dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)
    pc 0x000055dc2a5b3a06 fp 0x00007f675cdbeae0 dart::DartLibraryCalls::HandleMessage(dart::Object const&, dart::Instance const&)
    pc 0x000055dc2a5eda02 fp 0x00007f675cdbece0 dart::IsolateMessageHandler::HandleMessage(std::__2::unique_ptr<dart::Message, std::__2::default_delete<dart::Message> >)
    pc 0x000055dc2a62122d fp 0x00007f675cdbed50 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)
    pc 0x000055dc2a6219b6 fp 0x00007f675cdbedb0 dart::MessageHandler::TaskCallback()
    pc 0x000055dc2a758247 fp 0x00007f675cdbede0 dart::ThreadPool::Worker::Loop()
    pc 0x000055dc2a7580e5 fp 0x00007f675cdbee20 dart::ThreadPool::Worker::Main(unsigned long)
    pc 0x000055dc2a6c06c9 fp 0x00007f675cdbeed0 out/ReleaseX64/dart+0x17fe6c9
  -- End of DumpStackTrace
  --- Re-run this test:
  python tools/test.py -n dartk-reload-rollback-linux-release-x64 vm/dart/trigger_gc_in_native_test/3

There are also crashes on Android, see #37185. While that issue doesn't have any comments explaining why it was closed, looking at the logs I expect it's an OOM killer and unrelated.

sstrickl commented 5 years ago

Tried reproducing locally at the front and end of the affected range of commits, didn't trigger at any point over 100 runs of each. However, I expect my dev machine has more memory than the builder bots. Given that, it's likely something that was exposed due to memory pressure. If lazy constant evaluation shifted the memory use during reload so that memory was less constrained at that point, that could explain why it stopped happening after that commit.