canonical / mir

The Mir compositor
GNU General Public License v2.0
605 stars 97 forks source link

[mir:wayland] Intermittent crash #3457

Open AlanGriffiths opened 2 weeks ago

AlanGriffiths commented 2 weeks ago

This is after [mir:wayland] Fix handling of display unplug/replug and originally mentioned in the comments there. Before that PR failures even attempting this scenario always failed. The failure is intermittent (I can sometimes run continuous tests for hours without seeing it).

Test environment: a system running Miriway

Here are the incantations I used (both over ssh from another laptop, for ease of use):

while cmake-build-debug/bin/miral-app -demo-server -terminal glmark2-wayland --wayland-host=wayland-0; do :; done

And, in a separate shell:

while cp ~/.config/miriway-shell.display{~disabled,}; sleep 2; cp ~/.config/miriway-shell.display{~enabled,}; sleep 5; do :; done

Clearly, the ~{en,dis}abled configs need to exist and match your system

Expected: the nested server runs continuously Actual: occasional crashes of the nested server. The failure takes many forms:

terminate called after throwing an instance of 'boost::wrapexcept<std::system_error>'
  what():  Failed to create EGL surface: EGL_BAD_DISPLAY (0x3008)
Fatal glibc error: pthread_mutex_lock.c:94 (___pthread_mutex_lock): assertion failed: mutex->__data.__owner == 0
[conditionals] fragment-steps=0:vertex-steps=5:malloc(): unsorted double linked list corrupted
!!! Fatal signal received. Attempting cleanup, but deadlock may occur
Mir fatal error: Unsupported attempt to continue after a fatal signal: SIGABRT

But these are all suggestive of some form of corrupt or unsynchronized state

AlanGriffiths commented 2 weeks ago

Neither ub- nor address- sanitizer runs show anything interesting