Closed sfjohnson closed 9 months ago
Hello, thanks for letting me know. Could you please send me the output generated by drm_info, srm-display-info, and srm-all-connectors?
$ drm_info > drm_info_log.txt 2>&1
$ srm-display-info > srm_display_info_log.txt 2>&1
$ SRM_DEBUG=4 SRM_EGL_DEBUG=4 srm-all-connectors > srm_all_connectors_log.txt 2>&1
I believe the FPS drop and glitches could be attributed to the exclusive support for the atomic DRM API (I had a similar issue with an Nvidia card and proprietary drivers some time ago), and I suspect there might be a call to the legacy API that could be causing this. I will check this out.
For more context it's an Apple Studio Display connected to a PC by a DP to USB-C reverse cable. I'm also happy to test anything else on my hardware if you need.
Thank you, I've already downloaded them. I'm going to see what might be happening.
I ran several tests on my Nvidia card yesterday, and I'm experiencing the same issue. Interestingly, it didn't occur before, possibly due to a kernel or nvidia-drm version change. When using only dumb buffers, everything is fine, but with OpenGL/EGL, it starts at 60 fps and drops to 15 fps after a few seconds, and I don't quite understand why. I also tested kmscube, and the same issue occurs, so I suspect it's a driver issue. Could you try kmscube and see if it works well for you? I have another suspicion that when using OpenGL, it might be using a software-based rendering backend, swrast.so. Perhaps there's another backend that supports acceleration that can be installed. I'll continue investigating.
I tried kmscube and it doesn't run, I'm getting "Invalid argument" on drmModeAddFB2()
. I also tried removing flags from the gbm_bo_create()
and gbm_surface_create()
calls. Definitely some bad stuff going on with the latest NVIDIA driver.
I added a shader to srm-basic
and it's putting a significant load on the CPU, so next I might profile it to see if it's jumping into the software renderer as you say.
I ran perf
and I don't see swrast.so, but it looks like something is calling sched_yield()
too much:
8.86% srm-basic [kernel.vmlinux] [k] pick_next_task_fair
6.17% srm-basic [vdso] [.] __vdso_clock_gettime
4.41% srm-basic [kernel.vmlinux] [k] __schedule
4.35% srm-basic [kernel.vmlinux] [k] __update_curr
3.85% srm-basic [kernel.vmlinux] [k] do_sched_yield
3.60% srm-basic [kernel.vmlinux] [k] psi_account_irqtime
3.50% srm-basic [kernel.vmlinux] [k] srso_alias_return_thunk
3.20% srm-basic [kernel.vmlinux] [k] entry_SYSCALL_64
2.77% srm-basic [kernel.vmlinux] [k] srso_alias_safe_ret
2.65% srm-basic [kernel.vmlinux] [k] __pick_eevdf
2.54% srm-basic [kernel.vmlinux] [k] pick_next_entity.isra.0
2.33% srm-basic [kernel.vmlinux] [k] __cgroup_account_cputime
2.32% srm-basic [kernel.vmlinux] [k] raw_spin_rq_lock_nested
2.31% srm-basic [kernel.vmlinux] [k] preempt_count_add
2.14% srm-basic [kernel.vmlinux] [k] syscall_exit_to_user_mode
2.06% srm-basic [kernel.vmlinux] [k] do_syscall_64
1.87% srm-basic [kernel.vmlinux] [k] rcu_note_context_switch
1.81% srm-basic [kernel.vmlinux] [k] schedule
1.47% srm-basic [kernel.vmlinux] [k] update_rq_clock
1.44% srm-basic [kernel.vmlinux] [k] _raw_spin_lock
1.42% srm-basic [kernel.vmlinux] [k] sched_clock_cpu
1.38% srm-basic [kernel.vmlinux] [k] native_sched_clock
1.32% srm-basic libc.so.6 [.] __sched_yield
1.16% srm-basic [kernel.vmlinux] [k] preempt_count_sub
1.15% srm-basic [kernel.vmlinux] [k] sched_clock
1.10% srm-basic [kernel.vmlinux] [k] _raw_spin_unlock
1.07% srm-basic libnvidia-eglcore.so.545.29.06 [.] 0x0000000000afe6d7
Interesting, and when you run srm-all-connectors
, do you see the pixelated texture in the background and the white square cursor plane moving? If allocation through GBM fails, it should fallback to OpenGL.
I believe I'm temporarily giving up with this for now. I've tried everything to understand what's happening, but with no luck. I also noticed that even with dumb buffers, the FPS drops after a few seconds if I write too many times in the mapped buffer. If I write a few times, the FPS never drops. The curious thing is that the writing time of the dumb buffers increases, but so does the time when vblank events are emitted. Hence, I suspect that any interaction with the driver is likely slowed down by some internal bug. For now, my only recommendation is to use nouveau, which apparently works quite well. In any case, if I manage to solve this issue, I'll keep you informed here.
srm-all-connectors
did show the background and cursor as you describe.
Thanks for your investigation. My software uses both OpenGL and CUDA so I believe I will still need the proprietary driver. Hopefully we will get a new driver version from NVIDIA soon and we can re-test.
Hello,
Thank you for this library. I've been having some issues on NVIDIA. I'm using the latest proprietary driver version on Arch, 545.29.06.
First thing is I believe there is currently a bug in the driver where it won't accept any flags for
gbm_surface_create
. So changing this to 0 fixes an ENOSYS error. I believe it's the same issue as here.Once that is sorted, the issue is quite strange. The shader starts off rendering at 60 FPS and then after a few seconds the image starts to corrupt, the CPU usage rises and the framerate drops. The corruption looks like the pixels are drawn in the wrong order, or a bit like heavy video compression.
I've set the
srm-basic
example to render for a few seconds and then cleanup gracefully, but the issue persists across launches ofsrm-basic
. It seems to be related to EGL/GBM as I ran this code where the CPU writes directly to the framebuffer and it works fine.It appears that
GBM_BO_USE_SCANOUT
is always set ongbm_surface_create
, so it's possible my issue is caused byGBM_BO_USE_RENDERING
no longer being set.I'm not having much luck finding documentation on the GBM API so if you have any information that would be appreciated.