segfault when starting srm-basic

dranull commented 6 months ago

drm_info.log srm_display_info.log srm_basic.log coredumpctl.log

I hope everything you need is in the logs, but ask if you need more information. I'm using Arch, btw.

ehopperdietzel commented 6 months ago

Hello, thank you for sharing this. It appears that the issue occurs in srmBufferGetTextureID(). Have you tested if srm-all-connectors and srm-multi-session are functioning correctly?

There are a few options you could try:

Disable GBM with SRM_FORCE_GL_ALLOCATION=1
Force the use of the legacy DRM API with SRM_FORCE_LEGACY_API=1
Select a different buffer allocator GPU with SRM_ALLOCATOR_DEVICE=/dev/dri/card[N]

Additionally, regarding the srm_basic.log file, it seems incomplete. Could you please provide the complete log for further analysis?

dranull commented 6 months ago

I've made the following tests:

#card0 nvidia
#card1 intel iGPU

# crash
SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm1.log 2>&1
SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm2.log 2>&1
SRM_FORCE_LEGACY_API=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm3.log 2>&1
SRM_FORCE_LEGACY_API=1 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm4.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm9.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card1 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm10.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card1 SRM_FORCE_LEGACY_API=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm11.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card1 SRM_FORCE_LEGACY_API=1 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm12.log 2>&1

# stays in console, nothing happens (ctrl+c exits)
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm5.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm6.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_FORCE_LEGACY_API=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm7.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_FORCE_LEGACY_API=1 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm8.log 2>&1

# "Failed to create GBM surface"
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-all-connectors > srm-ac-nvidia.log 2>&1

The # crash category is the same as srm_basic.log. For some reason it didn't want to write the last few lines into the log file, but basically that's it, it crashes at that point.

...
  Format Y216   [LINEAR, X_TILED, Y_TILED, INVALID]
SRM debug: [/dev/dri/card0] Connector (90) gamma size = 1024.
SRM debug: Connector 90 device /dev/dri/card0 renderer mode = DUMB.
Segmentation fault (core dumped)

After I stop the process with Ctrl+C, this is the log (so srm[5-8].log) when nothing happens with nvidia (card0). Meaning, I run the command, the cursor jumps to a new line, and it just waits for something.

srm-multi-session crashes with intel, freezes with nvidia (I had to kill it in another terminal) srm-all-connectors works with intel, errors with nvidia (no freeze though) srm-ac-nvidia.log

ehopperdietzel commented 6 months ago

Alright, it seems to be an issue with the Nvidia proprietary driver, as it's failing to create a GBM surface (required when it's the allocator device) and is likely crashing when creating DUMB buffers (when the Intel card is the allocator).

Currently, SRM doesn't offer a way to blacklist a GPU. As a workaround, you could try disabling the Nvidia driver and using only the Intel one, which should resolve the issue. Alternatively, you could replace the Nvidia driver with nouveau.

I also have an Intel + Nvidia setup, and the Nvidia card works fine with both proprietary and nouveau drivers. Hence, I suspect it might be a driver configuration issue or bug. Perhaps it requires some backend for GBM or similar.

You might want to try running kmscube or the examples provided here. If those don't work, then it's definitely a driver bug, otherwise, the issue might lie with SRM.

dranull commented 6 months ago

Couldn't run kmscube, it complained about framebuffer creation for nvidia, and legacy drm for intel.

In my motherboard I could set a primary GPU, or disable one of them. Keeping only the intel iGPU (or using as primary) works. Using nouveau also works (kmscube too). So, yes it will be the nvidia driver.

Although I might add, I tried with v0.5.4-1 (which still doesn't work with the nvidia driver), but it doesn't stop with a segmentation fault, only v0.5.5-1 does.

ehopperdietzel commented 6 months ago

Thank you for conducting the testing. In the previous versions, a different approach was attempted where each GPU would render into its own connectors and share buffers through DMA if possible. However, this approach rarely worked despite driver support. In the new version, only one GPU handles rendering (the allocator), and the result is copied to dumb buffers of other GPUs for direct scanning out, or if dumb buffers are not supported, a CPU copy is performed, followed by texture creation and rendering using the other GPU. I suspect the crash may occur during the dumb buffer creation, but I will investigate further.

ehopperdietzel commented 6 months ago

Hi, I just released v0.5.6 which I think fixes the issue.

dranull commented 6 months ago

Hey, now it works if both GPUs are enabled, even if I plug my monitor in the nvidia card.

I've read the mentioned issue, and I can also confirm the system freeze with (only) the nvidia card.

To answer your other question from there, for me nvidia with proprietary driver works on other Wayland compositors reasonably well (if I ignore the occasional flickering, but that will be addressed in the near future both by compositors and by nvidia).

CuarzoSoftware / SRM

segfault when starting srm-basic #11