bayasdev / envycontrol

Easy GPU switching for Nvidia Optimus laptops under Linux
MIT License
1.33k stars 62 forks source link

fix: can't find gpu if was integrated in v3.4.0 #183

Closed flyxyz123 closed 3 months ago

flyxyz123 commented 3 months ago

Issue description and how to reproduce:

I was in integrated mode in version 3.4.0. But now I'm using version 3.5.0 and envycontrol -q wrongly shows I'm hybrid, and when I do sudo envycontrol -s integrated, it outpus error: "ERROR: Could not find Nvidia GPU\nTry switching to hybrid mode first!", switch to other modes also outputs this error. Expected behavior is envycontrol -q shows I'm integrated and sudo envycontrol -s integrated and other mode switches works.

Fix and reasons:

The problem is line 590 with CachedConfig(args).adapter(): goes to line 621 check if self.is_hybrid(): and that check returns I'm hybrid (but I'm not). The reason it return I'm hybrid is because in get_current_mode() function line 691 if os.path.exists(BLACKLIST_PATH) and os.path.exists(UDEV_INTEGRATED_PATH): only checks new UDEV_INTEGRATED_PATH but not old /lib/udev/rules.d/50-remove-nvidia.rules. And the success check leads to calling line 622 create_cache_file() function, and calls get_nvidia_gpu_pci_bus() function, and get_nvidia_gpu_pci_bus() errors and exits because it can't find nvidia gpu in line 396 and 397. It can't find nvidia gpu because it wrongly thinks I'm hybrid (but I'm not). If it correctly thinks I'm not hybrid, it will just use cache and nvidia gpu will be found.

Thus, my fix is to check old UDEV_INTEGRATED_PATH /lib/udev/rules.d/50-remove-nvidia.rules in get_current_mode() and if old path exist it will correctly report I'm integrated.

flyxyz123 commented 3 months ago

Misc

Afer my patch, I run sudo envycontrol -s integrated and it works now. I did not test other mode switches and I did not test when I was in other mode in v3.4.0, so maybe there are more bugs not discovered.

To futher clarify the logic, one can apply this diff see below to not fixed envycontrol to test, make sure you were integrated mode in version 3.4.0, which maybe can be simulated via sudo touch /lib/udev/rules.d/50-remove-nvidia.rules, not tested tho:

Then run `sudo ./envycontrol -s integrated`, it will output:

$ sudo ./envycontrol.py -s integrated e a c ERROR: Could not find Nvidia GPU Try switching to hybrid mode first!


Which shows my logic explained in the git commit message.
john-Ly commented 3 months ago

Same issue. 3.4.0 set integrated and upgrade to 3.5.0 on arch linux - produce " ERROR: Could not find Nvidia GPU\nTry switching to hybrid mode first!"

I found: integrated mode set by 3.4.0 will add blacklist file in /etc/modprob.d (to block nvidia card). After upgrade to 3.5.0, envy can't find nvida card proparly due to preset blacklist. So I rollback to 3.4.0 and set hybrid mode, then upgrade, then swithc integrated mode at 3.5.0 finally. Of course, reboot if need. ( I also try hybrid mode, which works well)

So, the point is: Maybe, envy programe logic is OK.

flyxyz123 commented 3 months ago

Same issue. 3.4.0 set integrated and upgrade to 3.5.0 on arch linux - produce " ERROR: Could not find Nvidia GPU\nTry switching to hybrid mode first!"

I found: integrated mode set by 3.4.0 will add blacklist file in /etc/modprob.d (to block nvidia card). After upgrade to 3.5.0, envy can't find nvida card proparly due to preset blacklist. So I rollback to 3.4.0 and set hybrid mode, then upgrade, then swithc integrated mode at 3.5.0 finally. Of course, reboot if need. ( I also try hybrid mode, which works well)

So, the point is: Maybe, envy programe logic is OK.

That's maybe another way to fix it.

My fix is to make get_current_mode() correctly return I'm integrated by considering old UDEV_INTEGRATED_PATH /lib/udev/rules.d/50-remove-nvidia.rules, so envycontrol will read cache file to find my gpu here without wrongly going through create_cache_file() here which will run get_nvidia_gpu_pci_bus() here and errored out because it can't find gpu.

Your fix is to remove /etc/modprobe.d/blacklist-nvidia.conf, so get_nvidia_gpu_pci_bus() will be able to find the gpu without using the cache file, so switching modes works.

But your fix requires mode switch to solve the issue of envycontrol -q wrongly return you are hybrid (but you are not). The trick here is if you use my patch, no mode switch is needed for envycontrol -q to return correct mode, and the issue of switching modes is also fixed. There are two issues my patch fixed, the first issue is envycontrol wrongly thinks I'm hybrid (but I'm not), the second issue is switching modes does not work. The root of these two issues is the first issue of "envycontrol wrongly thinks I'm hybrid". When the first issue is fixed, the second issue of "switching modes does not work" is also fixed.