Closed dranull closed 6 months ago
Hello, thank you for sharing this. It appears that the issue occurs in srmBufferGetTextureID()
. Have you tested if srm-all-connectors
and srm-multi-session
are functioning correctly?
There are a few options you could try:
Additionally, regarding the srm_basic.log
file, it seems incomplete. Could you please provide the complete log for further analysis?
I've made the following tests:
#card0 nvidia
#card1 intel iGPU
# crash
SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm1.log 2>&1
SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm2.log 2>&1
SRM_FORCE_LEGACY_API=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm3.log 2>&1
SRM_FORCE_LEGACY_API=1 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm4.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm9.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card1 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm10.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card1 SRM_FORCE_LEGACY_API=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm11.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card1 SRM_FORCE_LEGACY_API=1 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm12.log 2>&1
# stays in console, nothing happens (ctrl+c exits)
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm5.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm6.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_FORCE_LEGACY_API=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm7.log 2>&1
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_FORCE_LEGACY_API=1 SRM_FORCE_GL_ALLOCATION=1 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-basic > srm8.log 2>&1
# "Failed to create GBM surface"
SRM_ALLOCATOR_DEVICE=/dev/dri/card0 SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-all-connectors > srm-ac-nvidia.log 2>&1
The # crash
category is the same as srm_basic.log.
For some reason it didn't want to write the last few lines into the log file, but basically that's it, it crashes at that point.
...
Format Y216 [LINEAR, X_TILED, Y_TILED, INVALID]
SRM debug: [/dev/dri/card0] Connector (90) gamma size = 1024.
SRM debug: Connector 90 device /dev/dri/card0 renderer mode = DUMB.
Segmentation fault (core dumped)
After I stop the process with Ctrl+C, this is the log (so srm[5-8].log) when nothing happens with nvidia (card0). Meaning, I run the command, the cursor jumps to a new line, and it just waits for something.
srm-multi-session
crashes with intel, freezes with nvidia (I had to kill it in another terminal)
srm-all-connectors
works with intel, errors with nvidia (no freeze though) srm-ac-nvidia.log
Alright, it seems to be an issue with the Nvidia proprietary driver, as it's failing to create a GBM surface (required when it's the allocator device) and is likely crashing when creating DUMB buffers (when the Intel card is the allocator).
Currently, SRM doesn't offer a way to blacklist a GPU. As a workaround, you could try disabling the Nvidia driver and using only the Intel one, which should resolve the issue. Alternatively, you could replace the Nvidia driver with nouveau.
I also have an Intel + Nvidia setup, and the Nvidia card works fine with both proprietary and nouveau drivers. Hence, I suspect it might be a driver configuration issue or bug. Perhaps it requires some backend for GBM or similar.
You might want to try running kmscube or the examples provided here. If those don't work, then it's definitely a driver bug, otherwise, the issue might lie with SRM.
Couldn't run kmscube, it complained about framebuffer creation for nvidia, and legacy drm for intel.
In my motherboard I could set a primary GPU, or disable one of them. Keeping only the intel iGPU (or using as primary) works. Using nouveau also works (kmscube too). So, yes it will be the nvidia driver.
Although I might add, I tried with v0.5.4-1 (which still doesn't work with the nvidia driver), but it doesn't stop with a segmentation fault, only v0.5.5-1 does.
Thank you for conducting the testing. In the previous versions, a different approach was attempted where each GPU would render into its own connectors and share buffers through DMA if possible. However, this approach rarely worked despite driver support. In the new version, only one GPU handles rendering (the allocator), and the result is copied to dumb buffers of other GPUs for direct scanning out, or if dumb buffers are not supported, a CPU copy is performed, followed by texture creation and rendering using the other GPU. I suspect the crash may occur during the dumb buffer creation, but I will investigate further.
Hi, I just released v0.5.6 which I think fixes the issue.
Hey, now it works if both GPUs are enabled, even if I plug my monitor in the nvidia card.
I've read the mentioned issue, and I can also confirm the system freeze with (only) the nvidia card.
To answer your other question from there, for me nvidia with proprietary driver works on other Wayland compositors reasonably well (if I ignore the occasional flickering, but that will be addressed in the near future both by compositors and by nvidia).
drm_info.log srm_display_info.log srm_basic.log coredumpctl.log
I hope everything you need is in the logs, but ask if you need more information. I'm using Arch, btw.