Igalia / cog

WPE launcher and webapp container
MIT License
236 stars 61 forks source link

Segmentation fault on Buildroot -branch next / RPi3B+ #90

Open HowardACornwell opened 5 years ago

HowardACornwell commented 5 years ago

I'm getting a Segmentation fault when trying to launch cog. I've built for a RPi3B+ using Buildroot's next branch. Wayland launches successfully, then the error occurs when launching cog in a second SSH session.

SSH1

export XDG_RUNTIME_DIR=run/user
chmod 0700 run/user
weston --backend=fbdev-backend.so --tty=1 --device=/dev/fb0

# Launches Weston/Wayland successfully

SSH2

export XDG_RUNTIME_DIR=run/user
cog -P fdo http://google.com

# Fails
Wayland: Got a wl_compositor interface
Wayland: Got an xdg_shell interface
Wayland: Got a wl_shell interface
Segmentation fault

Versions are:

I've attached Buildroot's .config if that's any help.

charlie-ht commented 5 years ago

I'm hitting this as well, did some investigation here.

The backtrace is like so,

Thread 1 (LWP 252):
#0  strlen () at ../sysdeps/aarch64/strlen.S:94
#1  0x0000007fab1aa064 in __GI___strdup (s=0x0) at strdup.c:41
#2  0x0000007fa84c434c in wayland_drm_init (display=display@entry=0x2c312720, device_name=0x0, callbacks=0x7feacc8b48, callbacks@entry=0x7feacc8b68, user_data=user_data@entry=0x2c33bea0, flags=0) at wayland-drm.c:252
#3  0x0000007fa84bde4c in dri2_bind_wayland_display_wl (drv=<optimised out>, disp=0x2c33bea0, wl_dpy=0x2c312720) at drivers/dri2/egl_dri2.c:2839
#4  0x0000007fa84b4130 in eglBindWaylandDisplayWL (dpy=<optimised out>, display=0x2c312720) at main/eglapi.c:2197
#5  0x0000007fa7a592d8 in _ZN2WS8Instance10initializeEPv (this=0x2c347eb0, eglDisplay=0x2c33bea0) at /home/cturner/buildroot/buildroot-2019.02.2/output_wpe_64/build/wpebackend-fdo-1.0.0/src/ws.cpp:239
#6  0x0000007fa7a5825c in wpe_fdo_initialize_for_egl_display (display=<optimised out>) at /home/cturner/buildroot/buildroot-2019.02.2/output_wpe_64/build/wpebackend-fdo-1.0.0/src/initialize-egl.cpp:34
#7  0x0000007fa7a7343c in cog_platform_setup (platform=<optimised out>, shell=<optimised out>, params=<optimised out>, error=0x7feacc8c98) at /home/cturner/buildroot/buildroot-2019.02.2/output_wpe_64/build/cog-063df115456a24e464d1e6f284df22a0e65aea8e/platform/cog-platform-fdo.c:1390
#8  0x0000000000402454 in ?? ()
#9  0x0000007feacc8be0 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The dri2_dpy structure's device_name is NULL, I'm not sure why. There's fairly decent error handling for NULL device_names in dri2_initialize_wayland_drm inside mesa3d. Completely new to that codebase. Does the stack trace help diagnose the issue for anyone? I have a core if more information must be squeezed from it.

I didn't see any errors passing LIBGL_DEBUG=verbose WAYLAND_DEBUG=1 either :man_shrugging:

charlie-ht commented 5 years ago

So, according to https://bugs.webkit.org/show_bug.cgi?id=182490#c0, "swrast drivers do not provide BindWaylandDisplay (as they rely on wl_shm instead of Mesa extensions), and newer Mesa asserts that the extension is enabled when the entrypoint is called". I'm definitely using swrast here. However, we're are correct checking extension strings and checking return values from eglGetProcAddress in backend fdo

    if (isEGLExtensionSupported(extensions, "EGL_WL_bind_wayland_display")) {
        eglBindWaylandDisplayWL = reinterpret_cast<PFNEGLBINDWAYLANDDISPLAYWL>(eglGetProcAddress("eglBindWaylandDisplayWL"));
        eglQueryWaylandBufferWL = reinterpret_cast<PFNEGLQUERYWAYLANDBUFFERWL>(eglGetProcAddress("eglQueryWaylandBufferWL"));
    }
    if (!eglBindWaylandDisplayWL || !eglQueryWaylandBufferWL)
        return false;

so it's not the same issue as in the WK bug. I would think that the extension lookup should fail with swrast? Perhaps it doesn't and instead we need to avoid this and use wl_shm?

Another experiment I did in init_egl is to try getting an EGL display using eglGetPlatformDisplay or eglGetPlatformDisplayEXT before using eglGetDisplay. This is what WebKit does, but it didn't help the device name being NULL in the end, in case someone else thinks about try that...

charlie-ht commented 5 years ago

Some more debug logging,

# G_MESSAGES_DEBUG=all MESA_DEBUG=1 EGL_LOG_LEVEL=debug LIBGL_DEBUG=verbose WAYLAND_DEBUG=1 cog -P fdo http://google.com
(cog:500): Cog-DEBUG: 11:10:23.889: platform_setup: Platform name: fdo
(cog:500): Cog-DEBUG: 11:10:23.890: platform_setup: Platform plugin: libcogplatform-fdo.so
[1569196.740]  -> wl_display@1.get_registry(new id wl_registry@2)
[1569197.520]  -> wl_display@1.sync(new id wl_callback@3)
[1569198.817] wl_display@1.delete_id(3)
[1569199.076] wl_registry@2.global(1, "wl_compositor", 4)
Wayland: Got a wl_compositor interface
[1569199.437]  -> wl_registry@2.bind(1, "wl_compositor", 4, new id [unknown]@4)
[1569199.618] wl_registry@2.global(2, "wl_subcompositor", 1)
[1569199.734] wl_registry@2.global(3, "wp_viewporter", 1)
[1569199.830] wl_registry@2.global(4, "wp_presentation", 1)
[1569199.949] wl_registry@2.global(5, "zwp_relative_pointer_manager_v1", 1)
[1569200.078] wl_registry@2.global(6, "zwp_pointer_constraints_v1", 1)
[1569200.280] wl_registry@2.global(7, "zwp_input_timestamps_manager_v1", 1)
[1569200.425] wl_registry@2.global(8, "wl_data_device_manager", 3)
[1569200.536] wl_registry@2.global(9, "wl_shm", 1)
[1569200.644] wl_registry@2.global(10, "wl_output", 3)
[1569200.745] wl_registry@2.global(11, "zwp_input_panel_v1", 1)
[1569200.962] wl_registry@2.global(12, "zwp_text_input_manager_v1", 1)
[1569201.062] wl_registry@2.global(13, "zxdg_shell_v6", 1)
Wayland: Got an xdg_shell interface
[1569201.311]  -> wl_registry@2.bind(13, "zxdg_shell_v6", 1, new id [unknown]@5)
[1569201.751] wl_registry@2.global(14, "wl_shell", 1)
Wayland: Got a wl_shell interface
[1569202.085]  -> wl_registry@2.bind(14, "wl_shell", 1, new id [unknown]@6)
[1569202.334] wl_registry@2.global(15, "weston_desktop_shell", 1)
[1569202.475] wl_registry@2.global(16, "weston_screenshooter", 1)
[1569202.583] wl_callback@3.done(0)
Cog-INFO: 11:10:23.909: CHT: EGL_KHR_platform_wayland
Cog-INFO: 11:10:23.909: eglGetPlatformDisplay
[1569204.867]  -> wl_display@1.get_registry(new id wl_registry@3)
[1569205.163]  -> wl_display@1.sync(new id wl_callback@7)
[1569205.805] wl_display@1.delete_id(7)
[1569205.888] wl_registry@3.global(1, "wl_compositor", 4)
[1569206.082] wl_registry@3.global(2, "wl_subcompositor", 1)
[1569206.211] wl_registry@3.global(3, "wp_viewporter", 1)
[1569206.293] wl_registry@3.global(4, "wp_presentation", 1)
[1569206.627] wl_registry@3.global(5, "zwp_relative_pointer_manager_v1", 1)
[1569206.878] wl_registry@3.global(6, "zwp_pointer_constraints_v1", 1)
[1569206.956] wl_registry@3.global(7, "zwp_input_timestamps_manager_v1", 1)
[1569207.040] wl_registry@3.global(8, "wl_data_device_manager", 3)
[1569207.188] wl_registry@3.global(9, "wl_shm", 1)
[1569207.266] wl_registry@3.global(10, "wl_output", 3)
[1569207.404] wl_registry@3.global(11, "zwp_input_panel_v1", 1)
[1569207.523] wl_registry@3.global(12, "zwp_text_input_manager_v1", 1)
[1569207.643] wl_registry@3.global(13, "zxdg_shell_v6", 1)
[1569207.878] wl_registry@3.global(14, "wl_shell", 1)
[1569207.976] wl_registry@3.global(15, "weston_desktop_shell", 1)
[1569208.096] wl_registry@3.global(16, "weston_screenshooter", 1)
[1569208.452] wl_callback@7.done(0)
[1569209.096]  -> wl_display@1.get_registry(new id wl_registry@7)
[1569209.218]  -> wl_display@1.sync(new id wl_callback@8)
[1569209.771] wl_display@1.delete_id(8)
[1569209.859] wl_registry@7.global(1, "wl_compositor", 4)
[1569209.982] wl_registry@7.global(2, "wl_subcompositor", 1)
[1569210.092] wl_registry@7.global(3, "wp_viewporter", 1)
[1569210.167] wl_registry@7.global(4, "wp_presentation", 1)
[1569210.282] wl_registry@7.global(5, "zwp_relative_pointer_manager_v1", 1)
[1569210.539] wl_registry@7.global(6, "zwp_pointer_constraints_v1", 1)
[1569210.635] wl_registry@7.global(7, "zwp_input_timestamps_manager_v1", 1)
[1569210.740] wl_registry@7.global(8, "wl_data_device_manager", 3)
[1569210.991] wl_registry@7.global(9, "wl_shm", 1)
[1569211.257]  -> wl_registry@7.bind(9, "wl_shm", 1, new id [unknown]@9)
[1569211.588] wl_registry@7.global(10, "wl_output", 3)
[1569211.797] wl_registry@7.global(11, "zwp_input_panel_v1", 1)
[1569211.901] wl_registry@7.global(12, "zwp_text_input_manager_v1", 1)
[1569212.016] wl_registry@7.global(13, "zxdg_shell_v6", 1)
[1569212.231] wl_registry@7.global(14, "wl_shell", 1)
[1569212.379] wl_registry@7.global(15, "weston_desktop_shell", 1)
[1569212.551] wl_registry@7.global(16, "weston_screenshooter", 1)
[1569212.673] wl_callback@8.done(0)
[1569212.858]  -> wl_display@1.sync(new id wl_callback@8)
[1569213.153] wl_display@1.delete_id(8)
[1569213.223] wl_shm@9.format(0)
[1569213.401] wl_shm@9.format(1)
[1569213.468] wl_shm@9.format(909199186)
[1569213.526] wl_callback@8.done(0)
libEGL debug: DRI2: dlopen(/usr/lib/dri/swrast_dri.so)
libEGL debug: found extension `DRI_Core'
libEGL info: found extension DRI_Core version 2
libEGL debug: found extension `DRI_SWRast'
libEGL info: found extension DRI_SWRast version 4
libEGL debug: found extension `DRI_CopySubBuffer'
libEGL debug: found extension `DRI_ConfigOptions'
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /root/.drirc: No such file or directory.
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /root/.drirc: No such file or directory.
libEGL debug: found extension `DRI_TexBuffer'
libEGL info: found extension DRI_TexBuffer version 2
libEGL debug: found extension `DRI_RENDERER_QUERY'
libEGL debug: found extension `DRI_CONFIG_QUERY'
libEGL debug: found extension `DRI2_Fence'
libEGL debug: found extension `DRI_NoError'
libEGL debug: found extension `DRI_IMAGE'
libEGL debug: found extension `DRI_FlushControl'
libEGL debug: found extension `DRI_TexBuffer'
libEGL debug: found extension `DRI_RENDERER_QUERY'
libEGL info: found extension DRI_RENDERER_QUERY version 1
libEGL debug: found extension `DRI_CONFIG_QUERY'
libEGL info: found extension DRI_CONFIG_QUERY version 1
libEGL debug: found extension `DRI2_Fence'
libEGL info: found extension DRI2_Fence version 2
libEGL debug: found extension `DRI_NoError'
libEGL info: found extension DRI_NoError version 1
libEGL debug: found extension `DRI_IMAGE'
libEGL info: found extension DRI_IMAGE version 6
libEGL debug: found extension `DRI_FlushControl'
libEGL info: found extension DRI_FlushControl version 1
libEGL debug: did not find optional extension DRI_Robustness version 1
libEGL debug: did not find optional extension DRI2_Interop version 1
libEGL debug: did not find optional extension DRI2_Blob version 1
libEGL debug: did not find optional extension DRI_MutableRenderBufferDriver version 1
libEGL debug: No DRI config supports native format XRGB2101010
libEGL debug: No DRI config supports native format ARGB2101010
libEGL debug: No DRI config supports native format XBGR2101010
libEGL debug: No DRI config supports native format ABGR2101010
Cog-INFO: 11:10:23.949: EGL version 1.4 initialized.
[1569468.424]  -> wl_compositor@4.create_surface(new id wl_surface@8)
[1569468.626]  -> zxdg_shell_v6@5.get_xdg_surface(new id zxdg_surface_v6@10, wl_surface@8)
[1569468.732]  -> zxdg_surface_v6@10.get_toplevel(new id zxdg_toplevel_v6@11)
[1569469.100]  -> zxdg_toplevel_v6@11.set_title("Cog")
[1569469.497]  -> zxdg_toplevel_v6@11.set_app_id("com.igalia.Cog")
libEGL debug: mincore failed: Cannot allocate memory
[1569470.057]  -> wl_surface@8.commit()
[1569471.469]  -> wl_shm@9.create_pool(new id wl_shm_pool@12, fd 13, 3145728)
[1569471.681]  -> wl_shm_pool@12.create_buffer(new id wl_buffer@13, 0, 1024, 768, 4096, 1)
[1569472.092]  -> wl_shm_pool@12.destroy()
Segmentation fault

This is with a patch to use eglGetPlatformDisplay, which doesn't seem to affect anything. The interesting part to me is this allocation failure?

libEGL debug: mincore failed: Cannot allocate memory

Version information I'm using here (taken from the default in buildroot-wpe):

Cog: 063df115456a24e WPEBackend-fdo: 1.0.0

charlie-ht commented 5 years ago

The first issue for me was that I was running 64-bit ARM. I was having deadlock issues that are sort of known to the community with the DRM backends. It's not clear what the fix is, but the pi devs are basically saying that's not completely a supported configuration as of 2018. Who knows...

The second issue was that the stable buildroot was shipping a mesa with a few critical vc4 driver bugs. I'm on the latest mesa release and I can at least run a compositor and cog now. Not done further testing yet. You'll want weston built with DRM and you'll also want to ensure you have the vc4 overlay enabled in the firmware and a mesa version that is not bugged.

In summary this overlay is out-of-date. I'd recommend using none of the bundled packages here and instead rely on a current version of buildroot (mesa 18.3.3 for example is a terrible revision for vc4). I'll work on creating a defconfig giving a decent user experience out-of-the-box, because as of today this overlay is broken.

piotrwest commented 3 years ago

I'm seeing the same problem - Segmentation fault when running cog -P fdo on weston --backend=fbdev-backend.so. Same environment, same libraries do work when running cog -P fdo on weston --backend=drm-backend.so. I'm using Alpine Linux 3.14.2 (32bit), armv7 on RPi 4.

Any ideas why this might be happening? For investigation purposes, I'm attaching logs which can be diffed:

kmeinhar commented 2 years ago

I am facing the same issue @piotrwest . Sadly I am not able to fallback to the drm-backend.so on my platform. Were you able to resolve this issue?

Cog_SegFaultLog.txt GDB Log with Stacktrace

aperezdc commented 2 years ago

@piotrwest @HowardACornwell Could you please re-check with Cog 0.12.4? I think there are good chances that it could work as it includes fixes for software rendering so at least it should help when running with Weston's fbdev backend.