cypress-io / cypress

Fast, easy and reliable testing for anything that runs in a browser.
https://cypress.io
MIT License
46.86k stars 3.17k forks source link

Cypress in docker crashing randomly with SIGTRAP #27564

Open DGuentherTV opened 1 year ago

DGuentherTV commented 1 year ago

Current behavior

When running in our CI environment, Cypress (verify, info, run) randomly crashes with this error message:

cypress:cli Smoke test failed: Error: Command was killed with SIGTRAP (Debugger breakpoint): /root/.cache/Cypress/12.17.3/Cypress/Cypress --no-sandbox --smoke-test --ping=877

The weird thing is, that this happens seemingly random and may work and fail in the same container f.ex. verify works, but then run crashes even running verify 2 times could work once or twice or not at all

Even weirder: we didn't change anything in the docker image nor did we update cypress, yet the problem got somehow worse

Running the cypress binary directly with an attached gdb results in the following backtrace when this crash happens:


0x00005645553e8242 in partition_alloc::internal::(anonymous namespace)::FreelistCorruptionDetected (extra=<optimized out>) at ../../base/allocator/partition_allocator/partition_freelist_entry.h:31

31  ../../base/allocator/partition_allocator/partition_freelist_entry.h: No such file or directory.

#0  0x00005645553e8242 in partition_alloc::internal::(anonymous namespace)::FreelistCorruptionDetected(unsigned long) (extra=<optimized out>) at ../../base/allocator/partition_allocator/partition_freelist_entry.h:31

#1  0x00005645585830cf in partition_alloc::internal::PartitionFreelistEntry::GetNextInternal<true>(unsigned long, bool) const (this=<optimized out>, for_thread_cache=false, extra=<optimized out>) at ../../base/allocator/partition_allocator/partition_freelist_entry.h:303

#2  partition_alloc::internal::PartitionFreelistEntry::GetNext(unsigned long) const (this=<optimized out>, extra=<optimized out>) at ../../base/allocator/partition_allocator/partition_freelist_entry.h:328

#3  partition_alloc::internal::SlotSpanMetadata<true>::PopForAlloc(unsigned long) (this=<optimized out>, size=<optimized out>) at ../../base/allocator/partition_allocator/partition_page.h:739

#4  partition_alloc::PartitionRoot<true>::AllocFromBucket(partition_alloc::internal::PartitionBucket<true>*, unsigned int, unsigned long, unsigned long, unsigned long*, bool*) (this=<optimized out>, bucket=<optimized out>, flags=33, raw_size=<optimized out>, slot_span_alignment=16384, is_already_zeroed=0x7ffe9ec71d9f, usable_size=<optimized out>) at ../../base/allocator/partition_allocator/partition_root.h:1071

#5  partition_alloc::ThreadCache::FillBucket(unsigned long) (this=0x36500034c000, bucket_index=<optimized out>) at ../../base/allocator/partition_allocator/thread_cache.cc:607

#6  0x00005645584867d6 in partition_alloc::ThreadCache::GetFromCache(unsigned long, unsigned long*) (this=0x36500034c000, bucket_index=11, slot_size=0x7ffe9ec71df0) at ../../base/allocator/partition_allocator/thread_cache.h:525

#7  partition_alloc::PartitionRoot<true>::AllocWithFlagsNoHooks(unsigned int, unsigned long, unsigned long) (this=0x56455c841a00 <(anonymous namespace)::g_root+64>, flags=0, requested_size=<optimized out>, slot_span_alignment=16384) at ../../base/allocator/partition_allocator/partition_root.h:1742

#8  base::internal::PartitionMalloc(base::allocator::AllocatorDispatch const*, unsigned long, void*) (size=<optimized out>, context=<optimized out>) at ../../base/allocator/allocator_shim_default_dispatch_to_partition_alloc.cc:304

#9  0x000056455848617b in ShimMalloc (size=232, context=0x0) at ../../base/allocator/allocator_shim.cc:201

#10 malloc(size_t) (size=232) at ../../base/allocator/allocator_shim_override_libc_symbols.h:35

#11 0x00007fe4c985d8e4 in  () at /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0

#12 0x00007fe4c985f0c4 in  () at /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0

#13 0x00007fe4c9865240 in  () at /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0

#14 0x00007fe4c98616f1 in  () at /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0

#15 0x00007fe527d69bb9 in epoxy_glx_version () at /usr/lib/x86_64-linux-gnu/libepoxy.so.0

#16 0x00007fe527e9d005 in  () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0

#17 0x00007fe527e9d39c in  () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0

#18 0x00007fe527ea6428 in  () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0

#19 0x00007fe527ea32bb in  () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0

#20 0x00007fe527e92db0 in  () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0

#21 0x00007fe527e64e28 in gdk_display_manager_open_display () at /usr/lib/x86_64-linux-gnu/libgdk-3.so.0

#22 0x00007fe5290026d3 in gtk_init_check () at /usr/lib/x86_64-linux-gnu/libgtk-3.so.0

#23 0x000056455c186578 in gtk::GtkInitCheck(int*, char**) (argc=0x7ffe9ec72480, argv=0x365000211b00) at ../../ui/gtk/gtk_compat.cc:162

#24 0x000056455c18d091 in gtk::GtkInitFromCommandLine(int*, char**) (argc=0x7ffe9ec71d50, argv=0x564554216265) at ../../ui/gtk/gtk_util.cc:151

#25 0x000056455c187edd in gtk::GtkUi::Initialize() (this=0x365001978700) at ../../ui/gtk/gtk_ui.cc:237

#26 0x00005645581c0683 in ui::CreateLinuxUi() () at ../../ui/linux/linux_ui_factory.cc:40

#27 0x00005645554e0806 in electron::ElectronBrowserMainParts::ToolkitInitialized() (this=0x36500024d740) at ../../electron/shell/browser/electron_browser_main_parts.cc:386

#28 0x00005645575afcb8 in content::BrowserMainLoop::InitializeToolkit() (this=0x3650002708c0) at ../../content/browser/browser_main_loop.cc:1431

#29 0x00005645575b08ae in content::BrowserMainRunnerImpl::Initialize(content::MainFunctionParams) (this=0x365000330e10, parameters=...) at ../../content/browser/browser_main_runner_impl.cc:118

#30 0x00005645575ac3b7 in content::BrowserMain(content::MainFunctionParams) (parameters=...) at ../../content/browser/browser_main.cc:26

#31 0x00005645556aa391 in content::RunBrowserProcessMain(content::MainFunctionParams, content::ContentMainDelegate*) (main_function_params=..., delegate=<optimized out>) at ../../content/app/content_main_runner_impl.cc:684

#32 0x00005645556ab67c in content::ContentMainRunnerImpl::RunBrowser(content::MainFunctionParams, bool) (this=0x365000248000, main_params=..., start_minimal_browser=false) at ../../content/app/content_main_runner_impl.cc:1211

#33 0x00005645556ab471 in content::ContentMainRunnerImpl::Run() (this=0x365000248000) at ../../content/app/content_main_runner_impl.cc:1073

#34 0x00005645556a8def in content::RunContentProcess(content::ContentMainParams, content::ContentMainRunner*) (params=..., content_main_runner=0x365000248000) at ../../content/app/content_main.cc:437

#35 0x00005645556a8f04 in content::ContentMain(content::ContentMainParams) (params=...) at ../../content/app/content_main.cc:465

#36 0x00005645554215cb in main(int, char**) (argc=4, argv=<optimized out>) at ../../electron/shell/app/electron_main_linux.cc:44

We have tried a lot so far:

Maybe someone here has an idea or knows where we can find help

Desired behavior

Cypress should verify and run reliably

Test code to reproduce

N/A

Cypress Version

12.17.3

Node version

18.17.0

Operating System

Host: Ubuntu 22.04, Container: Debian - 11.6

Debug Logs

No response

Other

No response

MikeMcC399 commented 1 year ago

@DGuentherTV

Which CI are you using?

I have seen SIGTRAP produced on cypress/included:latest started from Docker cli docker run with --user 1001. This works correctly in GitHub Actions, but for instance in WSL2 it fails with SIGTRAP.

That may not be related to your issue, however the error messages are similar to yours. It is also strange that you have apparently random problems.

DGuentherTV commented 1 year ago

We are using Jenkins as CI system

MikeMcC399 commented 1 year ago

@DGuentherTV

We are using Jenkins as CI system

Thanks for filling in that information gap!

Cypress 13.2.0 was released yesterday, together with new Docker images. This includes some updates which have improved gpu handling.

You might like to try this new version to see if it improves your outcome.

DGuentherTV commented 11 months ago

Hey, sorry for the long silence 😅

Ihave just tried the latest version (13.3.3) of cypress with the latest docker image (node-20.9.0-chrome-118.0.5993.88-1-ff-118.0.2-edge-118.0.2088.46-1)

And it looks like we now moved from SIGTRAP to SIGSEGV (yay, progress 😄)

Thread 1 "Cypress" received signal SIGSEGV, Segmentation fault.

0x0000000000000000 in ?? ()

#0  0x0000000000000000 in ?? ()

#1  0x000055b59081fbc2 in node::BaseObjectPtrImpl<node::BaseObject, false>::~BaseObjectPtrImpl (this=0x5b400d78c88) at ../../third_party/electron_node/src/base_object-inl.h:175

#2  std::Cr::pair<node::FastStringKey const, node::BaseObjectPtrImpl<node::BaseObject, false> >::~pair (this=0x5b400d78c70) at ../../buildtools/third_party/libc++/trunk/include/__utility/pair.h:62

#3  std::Cr::__destroy_at<std::Cr::pair<node::FastStringKey const, node::BaseObjectPtrImpl<node::BaseObject, false> >, 0> (__loc=0x5b400d78c70) at ../../buildtools/third_party/libc++/trunk/include/__memory/construct_at.h:66

#4  std::Cr::destroy_at<std::Cr::pair<node::FastStringKey const, node::BaseObjectPtrImpl<node::BaseObject, false> >, 0> (__loc=0x5b400d78c70) at ../../buildtools/third_party/libc++/trunk/include/__memory/construct_at.h:101

#5  std::Cr::allocator_traits<std::Cr::allocator<std::Cr::__hash_node<std::Cr::__hash_value_type<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false> >, void*> > >::destroy<std::Cr::pair<node::FastStringKey const, node::BaseObjectPtrImpl<node::BaseObject, false> >, void, void> (__p=0x5b400d78c70) at ../../buildtools/third_party/libc++/trunk/include/__memory/allocator_traits.h:323

#6  std::Cr::__hash_table<std::Cr::__hash_value_type<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false> >, std::Cr::__unordered_map_hasher<node::FastStringKey, std::Cr::__hash_value_type<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false> >, node::FastStringKey::Hash, std::Cr::equal_to<node::FastStringKey>, true>, std::Cr::__unordered_map_equal<node::FastStringKey, std::Cr::__hash_value_type<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false> >, std::Cr::equal_to<node::FastStringKey>, node::FastStringKey::Hash, true>, std::Cr::allocator<std::Cr::__hash_value_type<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false> > > >::__deallocate_node (this=0x5b4002d2da8, __np=0x5b400d78c60) at ../../buildtools/third_party/libc++/trunk/include/__hash_table:1549

#7  std::Cr::__hash_table<std::Cr::__hash_value_type<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false> >, std::Cr::__unordered_map_hasher<node::FastStringKey, std::Cr::__hash_value_type<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false> >, node::FastStringKey::Hash, std::Cr::equal_to<node::FastStringKey>, true>, std::Cr::__unordered_map_equal<node::FastStringKey, std::Cr::__hash_value_type<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false> >, std::Cr::equal_to<node::FastStringKey>, node::FastStringKey::Hash, true>, std::Cr::allocator<std::Cr::__hash_value_type<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false> > > >::clear (this=0x5b4002d2da8) at ../../buildtools/third_party/libc++/trunk/include/__hash_table:1777

#8  std::Cr::unordered_map<node::FastStringKey, node::BaseObjectPtrImpl<node::BaseObject, false>, node::FastStringKey::Hash, std::Cr::equal_to<node::FastStringKey>, std::Cr::allocator<std::Cr::pair<node::FastStringKey const, node::BaseObjectPtrImpl<node::BaseObject, false> > > >::clear (this=0x5b4002d2da8) at ../../buildtools/third_party/libc++/trunk/include/unordered_map:1365

#9  node::Environment::RunCleanup (this=0x5b4002d2400) at ../../third_party/electron_node/src/env.cc:1013

#10 0x000055b5907de8c9 in node::FreeEnvironment (env=0x5b4002d2400) at ../../third_party/electron_node/src/api/environment.cc:445

#11 0x000055b58975dc75 in electron::NodeEnvironment::~NodeEnvironment (this=<optimized out>) at ../../electron/shell/browser/javascript_environment.cc:346

#12 0x000055b5897475a5 in std::Cr::default_delete<electron::NodeEnvironment>::operator() (this=<optimized out>, __ptr=0x5b4003895b0) at ../../buildtools/third_party/libc++/trunk/include/__memory/unique_ptr.h:65

#13 std::Cr::unique_ptr<electron::NodeEnvironment, std::Cr::default_delete<electron::NodeEnvironment> >::reset (this=0x5b40024d9e0, __p=0x0) at ../../buildtools/third_party/libc++/trunk/include/__memory/unique_ptr.h:297

#14 electron::ElectronBrowserMainParts::PostMainMessageLoopRun (this=0x5b40024d980) at ../../electron/shell/browser/electron_browser_main_parts.cc:607

#15 0x000055b58bb58355 in content::BrowserMainLoop::ShutdownThreadsAndCleanUp (this=0x5b400290c80) at ../../content/browser/browser_main_loop.cc:1131

#16 0x000055b58bb59fae in content::BrowserMainRunnerImpl::Shutdown (this=0x5b400305320) at ../../content/browser/browser_main_runner_impl.cc:176

#17 0x000055b58bb557c8 in content::BrowserMain (parameters=...) at ../../content/browser/browser_main.cc:43

#18 0x000055b589930694 in content::RunBrowserProcessMain (main_function_params=..., delegate=0x7ffd08ac2910) at ../../content/app/content_main_runner_impl.cc:710

#19 0x000055b589931f1e in content::ContentMainRunnerImpl::RunBrowser (this=0x5b400248000, main_params=..., start_minimal_browser=<optimized out>) at ../../content/app/content_main_runner_impl.cc:1280

#20 0x000055b589931d38 in content::ContentMainRunnerImpl::Run (this=0x5b400248000) at ../../content/app/content_main_runner_impl.cc:1134

#21 0x000055b58992f675 in content::RunContentProcess (params=..., content_main_runner=0x5b400248000) at ../../content/app/content_main.cc:330

#22 0x000055b58992f765 in content::ContentMain (params=...) at ../../content/app/content_main.cc:347

#23 0x000055b58966ceed in main (argc=<optimized out>, argv=0x7ffd08ac2ae8) at ../../electron/shell/app/electron_main_linux.cc:40
MikeMcC399 commented 11 months ago

@DGuentherTV

Did I understand you correctly that you are running the Cypress Docker container inside a Debian container on Ubuntu or did I misunderstand and the Cypress Docker container runs directly under Ubuntu?

Edit: According to /etc/debian_version the Cypress Docker container cypress/browsers:node-20.9.0-chrome-118.0.5993.88-1-ff-118.0.2-edge-118.0.2088.46-1 is built on Debian 11.8.

The Cypress Docker container cypress/browsers:node-18.16.0-chrome-114.0.5735.133-1-ff-114.0.2-edge-114.0.1823.51-1 is built on Debian 11.7.

JosXa commented 9 months ago

I'm on @DGuentherTV's team.

@MikeMcC399 Correct, the image is built on Debian and runs either on baremetal Ubuntu 22.04 machines (where it's a little better), or on Ubuntu 22.04 VMs (where it's real bad). The difference between running on baremetal or VMs is mostly an interesting curiosity, the problem appears on either.

This is very frustrating for our entire organization because we run Cypress as part of every CI build, and devs need to wait 16 minutes for all Cypress tests to pass successfully only for it to exit with

    ✔  All specs passed!                        16:06      225      192        -       33        -  

The Test Runner unexpectedly exited via a exit event with signal SIGSEGV

We also don't know what our best bet would be for circumventing the issue, as we haven't found anything that even remotely improves the situation. Should we try a different Docker image, maybe one not built on Debian and/or roll one ourselves? Currently, debates are starting about switching to Playwright just to not have this problem :/

MikeMcC399 commented 9 months ago

@JosXa

DGuentherTV commented 8 months ago

FYI: We have now created a docker image based on node:18.18.2-bookworm-slim where the crash seems to be gone.

Might be the Debian update, might be some setting/env that we did not set in comparison to the official cypress images.

MikeMcC399 commented 8 months ago

@DGuentherTV

FYI: We have now created a docker image based on node:18.18.2-bookworm-slim where the crash seems to be gone.

Might be the Debian update, might be some setting/env that we did not set in comparison to the official cypress images.

Great news!

I'd suggest that you close this issue now if you have a solution.