Open mobarre opened 8 years ago
By pure chance I happened to having tried that out involuntarily just yesterday.
I am starting VMware Workstation using primusrun so I can have 3D acceleration in Windows 10. Yesterday I forgot to shut down my virtual machine before going home. When I woke up my laptop at home, VMware Workstation was still running and I could log into Windows just fine. Nothing froze.
My setup:
I have seen posts in the gentoo forums with problems when using the newest nvidia drivers.
Please note, that the releases you use are very old, and the driver and kernel are bleeding edge. That does clash a bit I guess...
ok, so if I understand correctly you would suggest me to try out bumblebee and primus latest git to match the kernel and drivers ?
I'll give it a shot. it does make sense. Although I did have the same issue with older arch kernels (4.3) and most certainly nvidia driver 358.
There had been some development regarding module unloading and the nvidia-uvm module, which did not exist when the last release came out.
Further, I think you have to patch in the awareness of the UVM module with the patch I attached. Unfortunately I can't seem to be able to find out where I got it. :-( (but I am sure it was from one of the issues opened here regarding nvidia-uvm)
OK no success with latest got of bumblebeed. Primus is already the latest got version.
The patch, although unrelated in my honest opinion does help with module unloading on a clean application shutdown, but doesn't change the behavior on suspend. The application should indeed be suspended and not shutdown so the nvidia module should not be unloaded. You expect to find the module loaded and operational on resume.
oh, and setting the bridge to virtualgl changes the error message in the kernel logs: [ 1205.292587] PM: resume of devices complete after 777.383 msecs [ 1205.293070] PM: Finishing wakeup. [ 1205.293072] Restarting tasks ... [ 1205.298268] NVRM: GPU at PCI:0000:03:00: GPU-c3350c76-8707-abd9-a985-52814992bd10 [ 1205.298277] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception on GPC 0: SAVE_RESTORE_ADDR_OOB [ 1205.298332] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: ESR 0x500900=0x80000001
[ 1205.298398] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: ChID 0010, Class 0000b097, Offset 00001b0c, Data 1000f010 [ 1205.300415] done.
I feel like I'm getting somewhere. I'll try to downgrade the nvidia kernel to a version that actually had some success with some users, although I'm not sure anyone really does suspend with an optimus laptop running a 3D application...
New attempt made with a near full system downgrade to:
linux 4.2.1-1-ARCH xorg server 1.17.2-4 nvidia driver 355.11-1 bumblebee 20150118-2 primus 20150118-2
Behaves exactly the same. Any way I can get more debug log on the suspend process on bumblebee and nvidia driver side ?
Note that the PR does in no way fix the issue, but seems to make module unloading work for those who have an nvidia driver that loads nvidia_modeset
Same problem here:
[28191.854094] NVRM: GPU at PCI:0000:01:00: GPU-f0067e93-55ea-0863-49f5-0485472bf256
[28191.854103] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: Shader Program Header 1 Error
[28191.854108] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: Shader Program Header 2 Error
[28191.854112] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: Shader Program Header 3 Error
[28191.854116] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: Shader Program Header 9 Error
[28191.854119] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: Shader Program Header 18 Error
[28191.854125] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x405840=0xa204020e
[28191.854149] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0010, Class 0000b097, Offset 00002390, Data 00000000
[28193.526034] r8169 0000:09:00.0 enp9s0: link up
[28193.526048] IPv6: ADDRCONF(NETDEV_CHANGE): enp9s0: link becomes ready
Ok, I've been investigating more. Since we have someone else, let's see what we have in common. I never posted the full dmesg which might be silly. -> http://pastebin.com/HUM0C38h SEveral issues are visible in here, although I quite convinced that the main issue is with the Xid errors. but stil, some highlights:
@szebrowski could you have a look at your kernel logs and tell me what you get that looks like those highlights ? what laptop are you seeing this on ? can you give a quick version list (driver, xork, bumblebee, kernel at least) ?
Also, if you could +1 and add info on the nvidia devtalk post here: https://devtalk.nvidia.com/default/topic/918576/linux/application-driver-freeze-on-resume-from-suspend-with-optimus/ it could help. It's not like the nvidia dev seem to be giving a shit at the moment.
I’m adding this bug report to my review queue for 4.0. Will test it on my system to see if I can reproduce, and else will dig into your logs to see what we have here.
little update: I've tried removing bumblebee and use the nvidia driver + modesetting driver. suspend issue isn't there anymore, although I do get screen corruption on my X background. also, gdm is a no go (x gives me a black screen), secondary monitor blackens the screen too (at least with gnome-shell) when pluged in.
Biottom line is, bumblebee might be doing something that prevents normal resume. I'm still waiting for some feedback from the nvidia forums (wouldn't hold my breath though...) apparently when you code a proprietary driver, you make it your duty to leave people completely in the dark :)
I'm cross posting this also on the bumblebee bugtracker, because there seem to be a lot of similar yet different issues posted here. Original post: https://devtalk.nvidia.com/default/topic/918576/linux/application-driver-freeze-on-resume-from-suspend-with-optimus/
When I start an application with optirun and/or primus and suspend my laptop, the application is frozen on resume. From what I gather of dmesg, the driver doesn't seem to be able to wake up the card or restore its state properly.
Steps to reproduce: That's the easy part... run glxspheres (32bit or 64bit) through optirun, suspend, resume and voilà ! glxspheres should be frozen. The rest of the system works fine. Sometimes restarting the opengl app will work, sometimes a full system restart is needed.
This works with any OpenGL application that I try to run with optirun. it's been happening since at least septembre (might be older.)
So far, the most useful logs I can produce are an extract of my dmesg with the whole suspend/resume process. File is attached. You should notice the mess here:
[ 987.402372] NVRM: GPU at PCI:0000:03:00: GPU-c3350c76-8707-abd9-a985-52814992bd10 [ 987.402380] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 1 Error [ 987.402442] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 2 Error [ 987.402493] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 3 Error [ 987.402545] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 9 Error [ 987.402596] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 18 Error [ 987.402648] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: ESR 0x405840=0xa204020e [ 987.402727] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: ChID 0010, Class 0000b097, Offset 00001644, Data 00000001
I did check that under Windows 10 , the crash does not happen (not a hardware issue).
Additional info:
Hardware:
Anyone seeing this ? Which information would help pin this down ?