bayasdev / envycontrol

Easy GPU switching for Nvidia Optimus laptops under Linux
https://bayas.dev/envycontrol
MIT License
1.12k stars 55 forks source link

Integrated mode breaks latest version of VS Code (on Fedora) #160

Open billbsu opened 3 months ago

billbsu commented 3 months ago

Describe the bug In integrated mode, VS Code (Version 1.87.2) is just a blank page. Hybrid mode is fine and older version of VS Code works. Might just be something on VS Code's end but it seems that similar bugs happen due to graphical issues.

To Reproduce Steps to reproduce the behavior:

  1. Install Latest VS Code version (1.87.2)
  2. Run sudo envycontrol -s integrated
  3. reboot
  4. Boot VS Code using code . from the command line

Expected behavior VS code boots normally

System Information:

billbsu commented 3 months ago

quick update: the latest version of vscode to actually work is vscode 1.81

bayasdev commented 3 months ago

@billbsu open vscode in a terminal and paste the logs here

klmcwhirter commented 3 months ago

@billbsu, I was the main contributor for release 3.4.0. First let me thank you for reporting your issue. And please do post your logs so we can get to the bottom of it.

But, the only thing that changed in 3.4.0 was to cache the nvidia pci bus id so that a transition from integrated directly to nvidia mode is now possible. Please see the Files changed tab of PR #155 .

Note that #156 was probably included as well, but that PR had no code changes.

I use VS Code 1.87.2 on Fedora 39 for all my dev as well. And it works fine on my ACER Nitro 5 with RTX 3050 Ti Mobile in integrated mode.

I just now tested both with rpmfusion Nvidia drivers installed and without. Everything works fine for me in integrated mode.

FYI, on the HD with the drivers installed, I ran dnf update and saw no kernel updates, but saw this come through: akmod-nvidia-3:550.54.14.2.fc39.x86_64

Please mention which deployment mechanism you use. [ ] installed locally (via vscode repo installed with dnf) - this is how I have it installed. MS instructions [ ] installed as a flatpak [ ] vscode.dev

I looked up your laptop and it looks like it has RTX 3060 Ti Mobile PC Mag Review. Your lspci output was captured in integrated mode and so was missing the nvidia information.

May I please ask you to also weigh in on #157 ? This uses a completely different approach for switching to integrated mode. It simply turns the power off to the nvidia gpu so it does not appear on the bus. It would be really good for us to hear if this solves your issue.

Thanks again.

billbsu commented 3 months ago

Thank you for looking into it!

lspci output: 00:00.0 Host bridge: Intel Corporation Raptor Lake-P 6p+8e cores Host Bridge/DRAM Controller 00:01.0 PCI bridge: Intel Corporation Device a70d 00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] (rev 04) 00:04.0 Signal processing controller: Intel Corporation Raptor Lake Dynamic Platform and Thermal Framework Processor Participant 00:06.0 PCI bridge: Intel Corporation Raptor Lake PCIe 4.0 Graphics Port 00:06.2 PCI bridge: Intel Corporation Device a73d 00:07.0 PCI bridge: Intel Corporation Raptor Lake-P Thunderbolt 4 PCI Express Root Port #0 00:07.2 PCI bridge: Intel Corporation Raptor Lake-P Thunderbolt 4 PCI Express Root Port #2 00:08.0 System peripheral: Intel Corporation GNA Scoring Accelerator module 00:0a.0 Signal processing controller: Intel Corporation Raptor Lake Crashlog and Telemetry (rev 01) 00:0d.0 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 USB Controller 00:0d.2 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #0 00:0d.3 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #1 00:12.0 Serial controller: Intel Corporation Alder Lake-P Integrated Sensor Hub (rev 01) 00:14.0 USB controller: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller (rev 01) 00:14.2 RAM memory: Intel Corporation Alder Lake PCH Shared SRAM (rev 01) 00:14.3 Network controller: Intel Corporation Raptor Lake PCH CNVi WiFi (rev 01) 00:15.0 Serial bus controller: Intel Corporation Alder Lake PCH Serial IO I2C Controller #0 (rev 01) 00:15.1 Serial bus controller: Intel Corporation Alder Lake PCH Serial IO I2C Controller #1 (rev 01) 00:16.0 Communication controller: Intel Corporation Alder Lake PCH HECI Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation Device 51bd (rev 01) 00:1e.0 Communication controller: Intel Corporation Alder Lake PCH UART #0 (rev 01) 00:1f.0 ISA bridge: Intel Corporation Raptor Lake LPC/eSPI Controller (rev 01) 00:1f.3 Multimedia audio controller: Intel Corporation Raptor Lake-P/U/H cAVS (rev 01) 00:1f.4 SMBus: Intel Corporation Alder Lake PCH-P SMBus Host Controller (rev 01) 00:1f.5 Serial bus controller: Intel Corporation Alder Lake-P PCH SPI Controller (rev 01) 01:00.0 VGA compatible controller: NVIDIA Corporation AD107M [GeForce RTX 4060 Max-Q / Mobile] (rev a1) 01:00.1 Audio device: NVIDIA Corporation Device 22be (rev a1) 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO 58:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 5228 (rev 01)

I installed vscode 1.87 according to MS instructions. Right now, I have 1.81 installed using rpm.

I just tested it again. So vscode version 1.87.2 actually eventually does load after a few minutes, but after a while, it froze up and crashed.

I'm sorry, I'm a bit inexperienced with these kinds of things, how do I dump the startup log files after running code?

I will try the #157 fix in a bit.

Thank you!

bayasdev commented 3 months ago

@billbsu do you also experience this issue on other Electron apps or Chromium based browsers?

billbsu commented 3 months ago

I think discord is electron based and it works fine. And older versions of VS Code work fine. Chrome also launches fine but I don't use it so it may be bugged. Hope this helps!

bayasdev commented 3 months ago

I think discord is electron based and it works fine. And older versions of VS Code work fine. Chrome also launches fine but I don't use it so it may be bugged. Hope this helps!

Can you check if there are logs on this path ~/.config/Code/logs?

billbsu commented 3 months ago

20240324T131631.zip Here are the logs when it doesn't work.

Also, modinfo -F version nvidia shows my driver version. I don't think it did this before in integrated mode. Does that mean it's not working?

bayasdev commented 3 months ago

20240324T131631.zip Here are the logs when it doesn't work.

Also, modinfo -F version nvidia shows my driver version. I don't think it did this before in integrated mode. Does that mean it's not working?

I can't find anything relevant in the logs aside from this entry

2024-03-24 13:17:11.052 [error] [uncaught exception in sharedProcess]: Event not found: onDidChange: CodeExpectedError: Event not found: onDidChange

@billbsu could you download the Insiders version of VSCode and check if it exhibits the same issue?

Edit: how did you install VSCode via Flatpak or RPM?

billbsu commented 3 months ago

Vscode was installed via rpm. I just installed the insiders version. It behaves the same way. it stalls out 3 times (like the "program not responding") and then it finally loads. It is probably gonna freeze up and crash though. (will edit if it does)

billbsu commented 3 months ago

It did freeze indeed (i think when i tried opening a folder?) and here is the zip file 20240325T184945-Code-Insiders.zip

klmcwhirter commented 3 months ago

@billbsu To answer one of your earlier questions ... when switching to integrated mode the nvidia devices do not appear on the pci bus. That is an effect of the linux kernel reducing power consumption from the nvidia chip.

Hence, these lines do not appear in the lspci output when in integrated mode:

01:00.0 VGA compatible controller: NVIDIA Corporation AD107M [GeForce RTX 4060 Max-Q / Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 22be (rev a1)

And that was the purpose of the change in 3.4.0 - cache the 01:00.0 value so a switch directly to nvidia mode can be made from integrated mode. Without it, you need to switch to hybrid mode first (so the pci bus id is available again) and then to nvidia mode.

I do not see anything obvious in the logs either - except a few more lines of detail in main.log for the insiders edition:

2024-03-25 18:50:01.903 [error] CodeWindow: detected unresponsive
2024-03-25 18:50:15.682 [info] update#setState checking for updates
2024-03-25 18:50:15.823 [error] CodeWindow: detected unresponsive
2024-03-25 18:50:15.968 [info] update#setState idle
2024-03-25 18:50:18.346 [error] CodeWindow: detected unresponsive
2024-03-25 18:52:58.698 [error] Blocked vscode-webview request vscode-webview://096h6iv0a91h70a1r26khnbeo3cotit6mll0eh7lc1i5o0qbn511/index.html?id=990e0929-b293-442d-96a4-3ae982e77b42&origin=990e0929-b293-442d-96a4-3ae982e77b42&swVersion=4&extensionId=&platform=electron&vscode-resource-base-authority=vscode-resource.vscode-cdn.net&parentOrigin=vscode-file%3A%2F%2Fvscode-app
2024-03-25 18:53:38.948 [info] Extension host with pid 296038 exited with code: 0, signal: unknown.

Look at the last line. It suggests that at least one of the extensions is not happy.

I think I would start disabling extensions until I found the culprit. Then, once identified, determine whether it is still compatible.

Please make sure you followed the instructions at MS instructions to install the repo and then install vscode via dnf. This way all post-install scripts are assured to be executed. Note you may need to Uninstall the current version to get it to install correctly.

OH! Note that #157 should not be considered a fix. It is just a proposed alternate approach to entering integrated mode that may work better in certain situations. But, please do try it when you get a chance when you have time.

But, I really think the answer to your issue lies in VS Code somewhere and not envycontrol.

Hope that helps.

Let us know what you find.

klmcwhirter commented 2 months ago

@billbsu - any progress?

FYI - I needed to install the nvidia drivers from the Nvidia CUDA repo to help someone with pytorch and so I tested VS Code 1.88.0 (current stable) while I was at it. Absolutely no issues with those drivers either. Note I tested all 3 modes.

Miaua commented 2 months ago

VS code blank page is similar to black screen issue after login to latest CachyOS plasma 6 Wayland, when EnvyControl is in integrated mesa amdgpu. In latest daily build of Kubuntu 24.04, EnvyControl does not work, in 23.10 it works.

bayasdev commented 2 months ago

VS code blank page is similar to black screen issue after login to latest CachyOS plasma 6 Wayland, when EnvyControl is in integrated mesa amdgpu. In latest daily build of Kubuntu 24.04, EnvyControl does not work, in 23.10 it works.

I will have to investigate this, do you know if this also happens on Intel or is it only AMD?

Miaua commented 2 months ago

VS code blank page is similar to black screen issue after login to latest CachyOS plasma 6 Wayland, when EnvyControl is in integrated mesa amdgpu. In latest daily build of Kubuntu 24.04, EnvyControl does not work, in 23.10 it works.

I will have to investigate this, do you know if this also happens on Intel or is it only AMD?

The black screen issue with mouse cursor after login is also on intel, but maybe not related to EnvyControl. Perhaps some Mesa, Kernel or initramfs bug. For example EnvyControl does not work automatically on CachyOS, i have to run: 1) sudo udevadm control --reload 2) sudo udevadm trigger 3) sudo mkinitcpio -P linux-cachyos Because it's also not working on latest Kubuntu, probably something has changed.

bayasdev commented 2 months ago

VS code blank page is similar to black screen issue after login to latest CachyOS plasma 6 Wayland, when EnvyControl is in integrated mesa amdgpu. In latest daily build of Kubuntu 24.04, EnvyControl does not work, in 23.10 it works.

I will have to investigate this, do you know if this also happens on Intel or is it only AMD?

The black screen issue with mouse cursor after login is also on intel, but maybe not related to EnvyControl. Perhaps some Mesa, Kernel or initramfs bug. For example EnvyControl does not work automatically on CachyOS, i have to run:

1) sudo udevadm control --reload

2) sudo udevadm trigger

3) sudo mkinitcpio -P linux-cachyos

Because it's also not working on latest Kubuntu, probably something has changed.

Let me spin up a Kubuntu 24.04 to try

bayasdev commented 2 months ago

@Miaua I couldn't reproduce the issue on my system running latest Kubuntu 24.04 and Nvidia 535 drivers on integrated mode.

image

Miaua commented 2 months ago

@Miaua I couldn't reproduce the issue on my system running latest Kubuntu 24.04 and Nvidia 535 drivers on integrated mode.

image

I was using default Nouveau and when I went to integrated mode the power draw was 25W, usually it's 7-8W. I will test the 535 driver. I have been using Nouveau in Kubuntu 23.10 and Fedora 39, because RTX 4060 can not be powered down when proprietary 535 or 550 driver is installed, but if you delete the file /usr/share/glvnd/egl_vendor.d/10_nvidia.json then power draw drops a lot, like 2-4 times.

bayasdev commented 2 months ago

@Miaua I couldn't reproduce the issue on my system running latest Kubuntu 24.04 and Nvidia 535 drivers on integrated mode.

image

I was using default Nouveau and when I went to integrated mode the power draw was 25W, usually it's 7-8W. I will test the 535 driver. I have been using Nouveau in Kubuntu 23.10 and Fedora 39, because RTX 4060 can not be powered down when proprietary 535 or 550 driver is installed, but if you delete the file /usr/share/glvnd/egl_vendor.d/10_nvidia.json

then power draw drops a lot, like 2-4 times.

I don't see why integrated wouldn't work with nouveau since we're already blacklisting it

Miaua commented 2 months ago

@Miaua I couldn't reproduce the issue on my system running latest Kubuntu 24.04 and Nvidia 535 drivers on integrated mode.

image

I was using default Nouveau and when I went to integrated mode the power draw was 25W, usually it's 7-8W. I will test the 535 driver. I have been using Nouveau in Kubuntu 23.10 and Fedora 39, because RTX 4060 can not be powered down when proprietary 535 or 550 driver is installed, but if you delete the file /usr/share/glvnd/egl_vendor.d/10_nvidia.json then power draw drops a lot, like 2-4 times.

I don't see why integrated wouldn't work with nouveau since we're already blacklisting it

Integrated mode has always worked rock solid with Nouveau. Weird thing is, i installed Kubuntu 24.04 daily build with proprietary Nvidia driver, but driver manager showed that Nouveau is installed. Better wait the 24.04 official release. Fedora 40 KDE daily worked perfectly with EnvyControl.

klmcwhirter commented 2 months ago

I did notice that when I installed the 550.54.15 cuda driver on Fedora recently the installer did not blacklist noveau via /etc/modprobe.d, but rather added rd.driver.blacklist=nouveau to the GRUB_CMDLINE_LINUX entry in /etc/default/grub.

I was not able to chase down why. But, there were lot's of mentions of "making sure things happen early enough in the boot process" in the stuff I read while doing research.

FYI

@bayasdev I'll look into a way to potentially blacklisting nouveau this way. I don't really like it, but ...
Of course, we'll wait for someone to show it helps before putting it in place (perhaps via new command line option?).

I'll reach out to you separately to collect your thoughts and discuss design details.