Witko / nvidia-xrun

Utility to run separate X with discrete nvidia graphics with full performance
GNU General Public License v2.0
488 stars 69 forks source link

Failing to find a directory #148

Open mrbenjadmin opened 4 years ago

mrbenjadmin commented 4 years ago

I'm on Debian 10 and I've followed the installation instructions to the best of my ability, however I keep getting the same output regardless of if the command is run as root, or with a program specified:

Removing Nvidia bus from the kernel tee: '/sys/bus/pci/devices/0000:01:00.0/remove': No such file or directory 1 Enabling powersave for the PCIe controller auto

I'm not entirely sure what this means, but I would greatly appreciate any assistance.

P.S. I haven't posted on GitHub before so my apologies if I've messed up somewhere, or if this is the wrong place to put this.

michelesr commented 4 years ago

That probably means the bus ids aren't correct, you'll have to change them in /etc/default/nvidia-xrun. Follow the readme instructions, the last sentence in that paragraph.

Also read this thread.

mrbenjadmin commented 4 years ago

I ran lshw as root again and compared it to the config file at /etc/default/nvidia-xrun, and they seem to be the same values, 01:00.0 for the graphics card, and 00:01.0 for the PCIe controller.

It's rather odd that I've gotten the same issue twice, considering I completely reinstalled the OS since my first post. Maybe a package that I'm missing?

Thanks for responding so quickly by the way, I really appreciate it.

michelesr commented 4 years ago

The /sys/bus/pci/devices/ entries are created by the kernel itself, and you shouldn't require additional packages, so I don't know exactly what's happening in your system.

Can you see the bus entries in that directory? Nvidia-xrun will first attempt to use remove (e.g. /sys/bus/pci/devices/0000:01:00.0/remove) to de-register the card from the system, so that programs like GNOME shell or Xorg won't be able to load the nvidia module, which prevent the card controller to be put in power saving mode. Then it will set power/control on the PCI controller to auto (this is what powertop does, e.g. with --auto-tune or when you toggle the power-saving manually on the TUI) and that should effectively turn off che controller and so the card.

mrbenjadmin commented 4 years ago

Alright, I checked the /sys/bus/pci/devices/ directory and I can see quite a few folders including a 0000:00:01.0 folder for the PCIe bus, but I can't find a 0000:01:00.0 folder for the actual GPU.

michelesr commented 4 years ago

Does lspci | grep -i nvidia show the card? If not a previous run of nvidia-xrun might have already removed the card from the system, and thus the entry in /sys/bus/pci/devices. The systemd service of nvidia-xrun does the same at boot if enabled.

If that's the case, assuming the bus ids are set properly in the config file, nvidia-xrun should restore the card at the next run by triggering a PCI rescan in the kernel. You can trigger the rescan manually using this command:

sudo tee /sys/bus/pci/rescan <<<1

Then you should be able to see the card again.

I appreciate this might seem confusing so I'll try to break it down for you:

This is the default behavior of nvidia-xrun, and can be tweaked using the config file, e.g. you might choose not to remove the card if you're confident enough that the nvidia module won't be loaded by mistake, but it's not recommended if you have GNOME shell or Xorg using the modesetting driver (not sure how Wayland compositors handle this TBH, but since NVIDIA is not supporting Wayland maybe they won't try to load the module on Wayland sessions) .

mrbenjadmin commented 4 years ago

Alrighty, I did a PCI rescan and my graphics card is now visible in lspci. What should I do next?

michelesr commented 4 years ago

Just double check that ids are correctly set in the config file, then try to run a command with nvidia-xrun and check that it's working properly. Post the output here so that I can double check.

mrbenjadmin commented 4 years ago

Alright, I ran it as root, trying to start lutris, and this was the output:

Removing Nvidia bus from the kernel 1 Enabling powersave for the PCIe controller auto

The program didn't appear to start during this.

michelesr commented 4 years ago

Are you running that command from a linux virtual terminal tty or in a terminal emulator within a desktop environment? In order for this to work, expecially if you're using the modeset option in the module (which is the default for nvidia-xrun) you have to logout from your current graphical session and run nvidia-xrun from a linux virtual terminal (e.g CTRL+ALT+F2 ). The common use case is to run nvidia-xrun without arguments to start the X server and so run the X init script located at $XDG_CONFIG_HOME/X11/nvidia-xinitrc which has to contain a line such as exec gnome-session or whatever you need to start the desktop environment.

If you're trying to use it as you would use optirun then it won't work, AFAIK.

mrbenjadmin commented 4 years ago

I somehow didn't catch that I had to run it in a tty so thank you for pointing that out lol

I've now tried running openbox-session through nvidia-xrun as root on a free tty but I still seem to be getting that exact same output without a trace of openbox starting up.

michelesr commented 4 years ago

Can you please run it from a graphical terminal emulator with -d flag that is a dry run, and post the whole output here?

nvidia-xrun -d

This should print all the commands that nvidia-xrun will execute instead of actually executing them.

mrbenjadmin commented 4 years ago

Upon running nvidia-xrun -d:

Removing Nvidia bus from the kernel >>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:01:00.0/remove <<<1 Enabling powersave for the PCIe controller >>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:00:01.0/power/control <<<auto

michelesr commented 4 years ago

That doesn't look right to me, as it looks like nvidia-xrun is trying to only detach the graphic card rather than actually executing a command, as if the TURN_OFF_GPU_ONLY was set to 1 (see https://github.com/Witko/nvidia-xrun/blob/master/nvidia-xrun#L78).

Is by any chance that environment variable set? try this:

env TURN_OFF_GPU_ONLY=0 nvidia-xrun -d

And post the output here please.

michelesr commented 4 years ago

Also please post the output of:

cat /etc/default/nvidia-xrun
mrbenjadmin commented 4 years ago

TURN_OFF_GPU_ONLY is currently set to 1 because when I tried to run nvidia-xrun as root earlier, it told me that it must be set to 1 in order to run the command with sudo.

When I tried running env TURN_OFF_GPU_ONLY=0 nvidia-xrun -d the output it gave me was the exact same as the output given in my last post.

I also ran cat /etc/default/nvidia-xrun and this was the output: # When enabled, nvidia-xrun will turn the card on before attempting to load the # modules and running the command, and turn it off after the commands exits and # the modules gets unloaded. If order for this to work, CONTROLLER_BUS_ID and # DEVICE_BUS_ID must be set correctly. IDs can be found by by inspecting the # output of lshw. ENABLE_PM=1 # When PM is enabled, remove the card from the system after the command exists # and modules unload: the card will be readded in the next nvidia-xrun # execution before loading the nvidia module again. This is recommended as Xorg # and some other programs tend to load the nvidia module if they detect a # nvidia card in the system, and when the module is loaded the card can't save # power. REMOVE_DEVICE=1 # Bus ID of the PCI express controller CONTROLLER_BUS_ID=0000:00:01.0 # Bus ID of the graphic card DEVICE_BUS_ID=0000:01:00.0 # Seconds to wait before turning on the card after PCI devices rescan BUS_RESCAN_WAIT_SEC=1 # Ordered list of modules to load before running the command MODULES_LOAD=(nvidia nvidia_uvm nvidia_modeset "nvidia_drm modeset=1") # Ordered list of modules to unload after the command exits MODULES_UNLOAD=(nvidia_drm nvidia_modeset nvidia_uvm nvidia) TURN_OFF_GPU_ONLY=1

michelesr commented 4 years ago

Please remove TURN_OFF_GPU_ONLY=1 from your config file. That option exist only to be used by the systemd service to disable the nvidia card at boot. Nvidia-xrun has to be run as normal user as it will use sudo to elevate to superuser privileges.

Remove that from the config, then run nvidia-xrun -d again and check the output.

mrbenjadmin commented 4 years ago

Alrighty, a much longer output from nvidia-xrun -d this time:

Couldn't get a file descriptor referring to the console
Turning the PCIe controller on to allow card rescan
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:00:01.0/power/control <<<on
Waiting 1 second
>>Dry run. Command: sleep 1
Rescanning PCI devices
>>Dry run. Command: sudo tee /sys/bus/pci/rescan <<<1
Waiting 1 second for rescan
>>Dry run. Command: sleep 1
Turning the card on
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:01:00.0/power/control <<<on
Loading module nvidia
>>Dry run. Command: sudo modprobe nvidia
Loading module nvidia_uvm
>>Dry run. Command: sudo modprobe nvidia_uvm
Loading module nvidia_modeset
>>Dry run. Command: sudo modprobe nvidia_modeset
Loading module nvidia_drm modeset=1
>>Dry run. Command: sudo modprobe nvidia_drm modeset=1
>>Dry run. Command: xinit /etc/X11/xinit/nvidia-xinitrc "" -- :1 vt -nolisten tcp -br -config nvidia-xorg.conf -configdir nvidia-xorg.conf.d
Unloading module nvidia_drm
>>Dry run. Command: sudo modprobe -r nvidia_drm
Unloading module nvidia_modeset
>>Dry run. Command: sudo modprobe -r nvidia_modeset
Unloading module nvidia_uvm
>>Dry run. Command: sudo modprobe -r nvidia_uvm
Unloading module nvidia
>>Dry run. Command: sudo modprobe -r nvidia
Removing Nvidia bus from the kernel
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:01:00.0/remove <<<1
Enabling powersave for the PCIe controller
>>Dry run. Command: sudo tee /sys/bus/pci/devices/0000:00:01.0/power/control <<<auto

Though now when I run sudo nvidia-xrun, it spits out the following: This script must not be run as root unless TURN_OFF_GPU_ONLY=1 is set

michelesr commented 4 years ago

You don't have to run it with sudo, as sudo will be used internally.

The output looks sane now. Logout from your graphical session, open a virtual tty and run:

nvidia-xrun 

And it should work this time. If it doesn't, check that the nvidia-xinitrc file is properly configured to run your desktop environment, e.g. :

exec openbox-session
mrbenjadmin commented 4 years ago

It appears that there might be an issue with the elevation as there are quite a few mentions of operations not being permitted, here is the output:

Couldn't get a file descriptor referring to the console
Turning the PCIe controller on to allow card rescan
on
Waiting 1 second
Rescanning PCI devices
1
Waiting 1 second for rescan
Turning the card on
on
Loading module nvidia
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
Loading module nvidia_uvm
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
modprobe: FATAL: Module nvidia-current-uvm not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia_uvm
modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted
Loading module nvidia_modeset
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
modprobe: FATAL: Module nvidia-current-modeset not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia_modeset
modprobe: ERROR: could not insert 'nvidia_modeset': Operation not permitted
Loading module nvidia_drm modeset=1
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
modprobe: FATAL: Module nvidia-current-modeset not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia_modeset
modprobe: ERROR: could not insert 'nvidia_modeset': Operation not permitted
modprobe: FATAL: Module nvidia-current-drm not found in directory /lib/modules/4.19.0-9-amd64
modprobe: ERROR: ../libkmod/libkmod-module.c:979 command_do() Error running install command for nvidia_drm
modprobe: ERROR: could not insert 'nvidia_drm': Operation not permitted
/usr/bin/nvidia-xrun: line 17: xinit: command not found
Unloading module nvidia_drm
Unloading module nvidia_modeset
Unloading module nvidia_uvm
Unloading module nvidia
Removing Nvidia bus from the kernel
1
Enabling powersave for the PCIe controller
auto
michelesr commented 4 years ago

Couldn't get a file descriptor referring to the console

Are you running this from a linux virtual terminal?

Module nvidia-current not found in directory /lib/modules/4.19.0-9-amd64

Did you install the nvidia drivers? I'm not sure about this specific issue but you might want to look at the existing issues on this project about Debian and NVIDIA drivers

xinit: command not found

You need to install this, try sudo apt install -y xinit

mrbenjadmin commented 4 years ago

Are you running this from a linux virtual terminal?

I had run that command in a tty before this, but to be able to copy the output I ran it in a virtual terminal.

Did you install the nvidia drivers? I'm not sure about this specific issue but you might want to look at the existing issues on this project about Debian and NVIDIA drivers

Yes, I installed the newest available nvidia drivers from the Debian backports according to this article from the Debian website: https://wiki.debian.org/NvidiaGraphicsDrivers#Version440.82.28via_buster-backports.29

You need to install this, try sudo apt install -y xinit

Alrighty, done.

michelesr commented 4 years ago

Not sure how to help with the drivers in debian, maybe check https://github.com/Witko/nvidia-xrun/issues/44

mrbenjadmin commented 4 years ago

I'm going to try installing nvidia's proprietary drivers from their website instead of the ones from the debian backports, and hopefully that will work.

Thanks a ton for helping me so far, you're a life-saver lol

mrbenjadmin commented 4 years ago

Alright, I tried installing Nvidia's drivers from their website and it gave me a ton of warnings against installing things that weren't meant to be used with Debian, and didn't allow me to install them.

So basically where I'm at now is assuming this probably won't work for me and I'm likely going to need to switch back to Windows until either a fix comes for xrun, or an alternative program pops up for Debian.

DrWaleedAYousef commented 3 years ago

I have the same problem on my archlinux. I tried all the above; nothing working. it always gives the message:

tee: '/sys/bus/pci/devices/0000:01:00.0/remove': No such file or directory

I spent more than a days trying this out.