Open pravinxor opened 6 months ago
Hi there. Are you certain about this bit:
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver. [x] I confirm that this does not happen with the proprietary driver package.
I know it's easier to just tick that box than to report the bug to linux-bugs@nvidia.com or on the forums, but what you are effectively saying is that the bug is in the kernel modules (plausible) and that it is in the delta between Open and Proprietary. That delta in 555.xx is very very tiny, so I find it extremely unlikely. Please double-check, otherwise kernel engineers who monitor this tracker (which is for kernel module issues only) could waste time looking in the wrong place.
PS, it seems like in your testing you installed the old kernel module, but still kept the new userspace. This can cause all sorts of issues, so best get that fixed:
May 21 23:39:19 zephyrus kernel: NVRM: API mismatch: the client has the version 555.42.02, but
NVRM: this kernel module has the version 550.78. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
Thanks for getting back, sorry about the mismatch between the userspace and kernel drivers- I've sorted that out, so that they're both on the same version, however the error still occurs. As for whether this is specific to the open kernel modules, I can confirm that the proprietary does work correctly. I've attached 2 sets of log files (open and proprietary kernel modules). Each set includes an nvidia bug report log, as well as a report from chromium. I'm happy to provide other information or perform debugging as well, if you believe it could help. about-gpu-open.txt about-gpu-proprietary.txt nvidia-bug-report-open.log.gz nvidia-bug-report-proprietary.log.gz
Thanks for double-checking. That is very surprising to me, I don't see anything in the logs suggesting any meaningful difference (except maybe some external monitor unplugging - was the test for both with the same monitors attached).
We'll try to repro this internally. It's very concerning that there's a functional difference here. Thanks!
Between the two tests I most recently posted, the display configuration was exactly the same. However between the recent two tests and the first test I posted, one of the attached displays was different. Though, I don't believe this is a significant factor, since the issue occurs regardless of the displays configuration.
I just wanted to update this thread with a small change that has happened between then and now. The log messages from EGL appear a little different.
about-gpu-2024-06-26T18-24-54-455Z.txt nvidia-bug-report.log.gz
Adding some information that may be helpful, I reproduced with the proprietary driver
nvidia-dkms 560.35.03-18
Arch Linux
Linux Hanssen-Linux 6.11.5-arch1-1-g14 #1 SMP PREEMPT_DYNAMIC Sun, 27 Oct 2024 17:01:27 +0000 x86_64 GNU/Linux
GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU (UUID: GPU-5737ef92-c92d-bbb1-c337-b01b9b5e7640)
[12784:12784:1028/033113.573410:ERROR:angle_platform_impl.cc(44)] ImageEGL.cpp:112 (operator()): eglCreateImage failed with 0x00003003
ERR: ImageEGL.cpp:112 (operator()): eglCreateImage failed with 0x00003003
[12784:12784:1028/033113.573596:ERROR:scoped_egl_image.cc(23)] Failed to create EGLImage: EGL_SUCCESS
[12784:12784:1028/033113.573804:ERROR:native_pixmap_egl_binding.cc(118)] Unable to initialize binding from pixmap
[12784:12784:1028/033113.573941:ERROR:ozone_image_backing.cc(309)] OzoneImageBacking::ProduceSkiaGanesh failed to create GL representation
[12784:12784:1028/033113.574008:ERROR:shared_image_manager.cc(255)] SharedImageManager::ProduceSkia: Trying to produce a Skia representation from an incompatible backing: OzoneImageBacking
[12784:12784:1028/033113.574139:ERROR:gpu_service_impl.cc(1161)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly.
[12735:12782:1028/033113.584699:ERROR:shared_image_interface_proxy.cc(134)] Buffer handle is null. Not creating a mailbox from it.
[12735:12782:1028/033113.584732:ERROR:one_copy_raster_buffer_provider.cc(348)] Creation of StagingBuffer's SharedImage failed.
[12735:12782:1028/033113.585115:ERROR:shared_image_interface_proxy.cc(134)] Buffer handle is null. Not creating a mailbox from it.
[12735:12782:1028/033113.585124:ERROR:one_copy_raster_buffer_provider.cc(348)] Creation of StagingBuffer's SharedImage failed.
[12735:12782:1028/033113.586152:ERROR:shared_image_interface_proxy.cc(134)] Buffer handle is null. Not creating a mailbox from it.
[12735:12782:1028/033113.586160:ERROR:one_copy_raster_buffer_provider.cc(348)] Creation of StagingBuffer's SharedImage failed.
[12735:12782:1028/033113.586377:ERROR:shared_image_interface_proxy.cc(134)] Buffer handle is null. Not creating a mailbox from it.
[12735:12782:1028/033113.586381:ERROR:one_copy_raster_buffer_provider.cc(348)] Creation of StagingBuffer's SharedImage failed.
[12735:12735:1028/033113.590044:ERROR:gpu_process_host.cc(982)] GPU process exited unexpectedly: exit_code=8704
[12735:12782:1028/033113.594426:ERROR:shared_image_interface_proxy.cc(134)] Buffer handle is null. Not creating a mailbox from it.
[12735:12782:1028/033113.594443:ERROR:one_copy_raster_buffer_provider.cc(348)] Creation of StagingBuffer's SharedImage failed.
[12735:12782:1028/033113.594460:ERROR:shared_image_interface_proxy.cc(134)] Buffer handle is null. Not creating a mailbox from it.
[12735:12782:1028/033113.594462:ERROR:one_copy_raster_buffer_provider.cc(348)] Creation of StagingBuffer's SharedImage failed.
[12735:12782:1028/033113.594809:ERROR:shared_image_interface_proxy.cc(134)] Buffer handle is null. Not creating a mailbox from it.
[12735:12782:1028/033113.594818:ERROR:one_copy_raster_buffer_provider.cc(348)] Creation of StagingBuffer's SharedImage failed.
[12735:12782:1028/033113.594833:ERROR:shared_image_interface_proxy.cc(134)] Buffer handle is null. Not creating a mailbox from it.
[12735:12782:1028/033113.594835:ERROR:one_copy_raster_buffer_provider.cc(348)] Creation of StagingBuffer's SharedImage failed.
[12889:10:1028/033113.636464:ERROR:command_buffer_proxy_impl.cc(131)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer.
[12895:10:1028/033113.719569:ERROR:command_buffer_proxy_impl.cc(131)] ContextResult::kTransi
entFailure: Failed to send GpuControl.CreateCommandBuffer.
I reproduced with the proprietary driver
Same on the open driver.
nvidia-open-dkms 560.35.03-18
For anyone suffering from this problem and reaching here, I bypassed this problem by adding --disable-gpu-compositing
flag to Chrome as a temporary solution.
Check Hardware acceleration in electron apps on nvidia doesn't work for more information.
NVIDIA Open GPU Kernel Modules Version
555.42.02
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Arch Linux
Kernel Release
Linux 6.9.1-hardened1-1-hardened #1 SMP PREEMPT_DYNAMIC Mon, 20 May 2024 12:54:08 +0000 x86_64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU (UUID: GPU-57e1b957-4845-a325-50fb-12cb069295cd)
Describe the bug
When starting Chromium (or any chromium based program) using the
--ozone-platform=wayland
flag, the GPU process for Chromium cannot start, thus causing hardware acceleration to be completely unavailable- even if the browser is not tasked with performing the hardware acceleration on the Nvidia GPU.Relevant parts of the Chromium event log:
To Reproduce
--ozone-platform=wayland
flag, so that chromium is running as a native wayland app (and not via Xwayland)Note: hardware acceleration is active and performs correctly when Chromium is running via XWayland
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
No response