FedoraTipper / Nvidia-Fan-Curve-Linux

Nvidia Speed Fan Control - Fan Curve
GNU General Public License v3.0
38 stars 7 forks source link

is it possible to run it on headless node? #2

Closed 1a1a11a closed 6 years ago

1a1a11a commented 7 years ago

It seems it requires a display, is that possible to run without a monitor?

FedoraTipper commented 7 years ago

This method requires nvidia's proprietary driver to be installed and running. Just download the driver, and run it on startup with this script. I haven't really tried it headless, but in theory this should yes.

As a side note, do you have coolbits enabled?

1a1a11a commented 7 years ago

thank you for quick replying! It doesn't work with headless node, but I found this one which works https://github.com/FedoraTipper/Nvidia-Fan-Curve---Linux

fireheadman commented 6 years ago

I'm late to this conversation, just stumbled across this script... which is excellent to start off and build more functionality into it...

Yes, It can run in headless...
Modify the calls like this:

#Get GPU temperature
gputemp=`DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings \
        -q GPUCoreTemp |awk -F ":" 'NR==2{print $3}' |sed 's/[^0-9]*//g'`
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings \
        -a "[fan-0]/GPUTargetFanSpeed=${newfanspeed}" 2>&1 >/dev/null \
        -a "[fan-1]/GPUTargetFanSpeed=${newfanspeed}" 2>&1 >/dev/null \

This works for me...

fireheadman commented 6 years ago

if the developer is still active on this... would be nice to have it cycle through each GPU and place fan speeds individually

FedoraTipper commented 6 years ago

Hey,

Thanks for the info, I'll look into this when I have time this week. As for cycling, that can be done. Thanks for the suggestion. Also if you want to create a PR, you are more than welcome.

1a1a11a commented 6 years ago

I am still having trouble, it gives me

Failed to connect to Mir: Failed to connect to server socket: No such file or directory Unable to init server: Could not connect: Connection refused

ERROR: The control display is undefined; please runnvidia-settings --helpfor usage information.

Neo2SHYAlien commented 6 years ago

@1a1a11a https://github.com/FedoraTipper/Nvidia-Fan-Curve-Linux/pull/3 enable the headless mode

1a1a11a commented 6 years ago

ERROR: Error assigning value 70 to attribute 'GPUTargetFanSpeed' (asrock:0[fan:1]) as specified in assignment '[fan-1]/GPUTargetFanSpeed=70' (Unknown Error).

fireheadman commented 6 years ago

I'm liking the new edition to the script!!! if you want a complete headless mode... you can edit last like to look like this

done &

This will run the script in the background. Another item of interest.... You could get this into systemd as a startup service (aka fancurve.service) in /lib/systemd/system/fancurve.service (then restart systemd daemon and enable) OR as I am attempting... wanting there to be a series of scripts for my XMR.service. Would have others setup for other currencies... like ETH, ZEC...etc. Would just disable one and enable the other.

Would be something like this: (rough draft, this doesn't work as of yet)

    [Unit]
    Description=xmr
    After=network.target

    [Service]
    ExecStartPre=/home/fireheadman/scripts/set_overclock.sh
        ExecStartPre=/home/fireheadman/scripts/set_fancurve.sh
    ExecStart=/home/fireheadman/miners/xmr-stak/xmr-stak.sh
    User=root

    [Install]
    WantedBy=multi-user.target
fireheadman commented 6 years ago

@1a1a11a Are you running the script as root? or using sudo ? If not, you need to run with elevated permissions for nvidia-settings to work correctly.

for example:
WITHOUT ROOT/SUDO

fireheadman@clauneck:~/scripts$ DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q GPUCurrentFanSpeedRPM | grep fan | awk '{ print "RPMs ",$3, $4 }'
No protocol specified
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused

ERROR: The control display is undefined; please run `nvidia-settings --help` for usage information.

...AND WITH SUDO (ROOT)

fireheadman@clauneck:~/scripts$ sudo !!
sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q GPUCurrentFanSpeedRPM | grep fan | awk '{ print "RPMs ",$3, $4 }'
RPMs  (clauneck:0[fan:0]): 2876.
RPMs  (clauneck:0[fan:1]): 998.
RPMs  (clauneck:0[fan:2]): 2880.
RPMs  (clauneck:0[fan:3]): 996.
fireheadman@clauneck:~/scripts$
Neo2SHYAlien commented 6 years ago

@1a1a11a how many gpu you have ? What is your driver version ? @fireheadman I have to test

DISPLAY=:0 XAUTHORITY=/var/run/lightdm/${USER}/:0

On my server the root is the only one user and I totally forgot for the others....

FedoraTipper commented 6 years ago

Thank @Neo2SHYAlien for the work πŸ‘, and @fireheadman for the research.

As what @fireheadman stated about system startup, I'm planning on working on an installation script to streamline the install process. As a proposed solution would be to build the script for multiple startup daemons, not all systems will have systemd.

@1a1a11a It looks like the script is looping through your motherboard fan headers. We might need to readdress fan iteration in the script.

Misclicked close ticket. My bad.

1a1a11a commented 6 years ago

Thank you for your help! @fireheadman @Neo2SHYAlien @FedoraTipper, I am running Ubuntu 16.04 with sudo and I have two GPUs on.

sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q GPUCurrentFanSpeedRPM | grep fan | awk '{ print "RPMs ",$3, $4 }' gives correct result.

but directly running FanCurveScript.sh will give the error shown in the previous posts, is the way I used it wrong?

fireheadman commented 6 years ago

@1a1a11a I believe you need sudo to call the script... so "sudo ./FanCurveScript.sh"

@Neo2SHYAlien If you are running purely as root user (not recommended for many reasons)... then no need to have the ${user} ...just have that as "root"

Neo2SHYAlien commented 6 years ago

@fireheadman in case of non root user I have to check the XAUTHORITY path for it. I didn't test it. Unfortunately my server is down till Sunday when I'll have to check it. @1a1a11a can you please test the script with sudo if don't work please start the script in this way sudo bash -x ./FanCurveScript.sh and provide us with the output of it.

1a1a11a commented 6 years ago

@fireheadman @Neo2SHYAlien Yeah, I used sudo and get the error.

ERROR: Error assigning value 70 to attribute 'GPUTargetFanSpeed' (asrock:0[fan:0]) as specified in assignment '[fan-0]/GPUTargetFanSpeed=70' (Unknown Error).

ERROR: Error assigning value 70 to attribute 'GPUTargetFanSpeed' (asrock:0[fan:1]) as specified in assignment '[fan-1]/GPUTargetFanSpeed=70' (Unknown Error).

fireheadman commented 6 years ago

@1a1a11a I'm a little unsure how all the output you posted is being used.... are you running each line of the script? When posting, please use the "< >" for CODE tags to preserve formatting.

Something you may want to try is appending DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 to your nvidia-settings command(s) So it would be:

DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings \
        -a "[fan-1]/GPUTargetFanSpeed=70" 

Also note, I used double quotes vs single quotes.

1a1a11a commented 6 years ago

Hi @fireheadman, I tried

sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -a "[fan-1]/GPUTargetFanSpeed=70"

and get the following error

ERROR: Error assigning value 70 to attribute 'GPUTargetFanSpeed' (asrock:0[fan:1]) as specified in assignment '[fan-1]/GPUTargetFanSpeed=70' (Unknown Error).

fireheadman commented 6 years ago

bummer... not really sure what to do at this point other than give you the canned advice. You might need to reinstall your OS (I've had to do that about 7 times to get this far, but I was the culprit in all my learning mistakes). Or seek out a nvidia based forum(s) for further assistance. This command is not exclusive to this project (the developer did not create it), so it would be at the discretion of this project developer to further assist you. Unless someone else views this issue and has a solution.

I wish you luck in resolving this.

FedoraTipper commented 6 years ago

@1a1a11a Which display manager are you using?

XAUTHORITY=/var/run/lightdm/root/:0

Will only work if lightdm is your set display manager.

Also when you

echo $DISPLAY

Does the value correspond to DISPLAY={number}?

Could you perhaps try running without relying on a DM. Make sure X server isn't in use, or else this won't work.

sudo xinit nvidia-settings -a '[fan-1]/GPUTargetFanSpeed=75' -- :0 -once

1a1a11a commented 6 years ago

@fireheadman and @FedoraTipper thank you for your detailed helps!

echo $DISPLAY gives empty string

I noticed that X is running, so I killed it and when I run the same command sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -a "[fan-1]/GPUTargetFanSpeed=70", I got a different error this time.

Failed to connect to Mir: Failed to connect to server socket: No such file or directory Unable to init server: Could not connect: Connection refused ERROR: The control display is undefined; please runnvidia-settings --helpfor usage information.

fireheadman commented 6 years ago

@1a1a11a

Lets try this... There are many options for Xserver configurations. At the least, if you are running headless, then I would assume you are running Ubuntu Server and you only have lightdm running. If not, then you must have some other configuration you can explain/describe?

post output from these commands, should look like below

/usr/sbin/lightdm -v
sudo systemctl is-enabled lightdm
sudo systemctl status lightdm
fireheadman@clauneck:~$ /usr/sbin/lightdm -v
lightdm 1.18.3
fireheadman@clauneck:~$ sudo systemctl is-enabled lightdm
enabled
fireheadman@clauneck:~$ sudo systemctl status lightdm
● lightdm.service - Light Display Manager
   Loaded: loaded (/lib/systemd/system/lightdm.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2018-04-08 13:10:21 MDT; 1 day 5h ago
     Docs: man:lightdm(1)
  Process: 1119 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=ex
 Main PID: 1138 (lightdm)
    Tasks: 5
   Memory: 51.0M
      CPU: 41min 56.200s
   CGroup: /system.slice/lightdm.service
           β”œβ”€1138 /usr/sbin/lightdm
           β”œβ”€1165 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
           └─1330 lightdm --session-child 12 19
1a1a11a commented 6 years ago

@fireheadman

jason@asrock:~$ /usr/sbin/lightdm -v
lightdm 1.18.3
jason@asrock:~$ sudo systemctl is-enabled lightdm
enabled
jason@asrock:~$ sudo systemctl status lightdm
● lightdm.service - Light Display Manager
   Loaded: loaded (/lib/systemd/system/lightdm.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/display-manager.service.d
           └─xdiagnose.conf
   Active: inactive (dead) (Result: exit-code) since Mon 2018-04-09 15:20:50 EDT; 6h ago
     Docs: man:lightdm(1)
  Process: 12356 ExecStart=/usr/sbin/lightdm (code=exited, status=1/FAILURE)
  Process: 12353 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=exited, status=0/SUCCESS)
 Main PID: 12356 (code=exited, status=1/FAILURE)

Apr 09 15:20:49 asrock systemd[1]: lightdm.service: Main process exited, code=exited, status=1/FAILURE
Apr 09 15:20:49 asrock systemd[1]: lightdm.service: Unit entered failed state.
Apr 09 15:20:49 asrock systemd[1]: lightdm.service: Triggering OnFailure= dependencies.
Apr 09 15:20:49 asrock systemd[1]: lightdm.service: Failed with result 'exit-code'.
Apr 09 15:20:50 asrock systemd[1]: lightdm.service: Service hold-off time over, scheduling restart.
Apr 09 15:20:50 asrock systemd[1]: Stopped Light Display Manager.
Apr 09 15:20:50 asrock systemd[1]: lightdm.service: Start request repeated too quickly.
Apr 09 15:20:50 asrock systemd[1]: Failed to start Light Display Manager.

jason@asrock:~$ sudo lightdm
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
update-alternatives: error: no alternatives for x86_64-linux-gnu_gfxcore_conf

I have tried to reinstall lightdm, but it didn't work, do you think I should reboot?

fireheadman commented 6 years ago

@1a1a11a You just posted your problem.. Your display manager is not active. You will need to resolve this issue and then the commands will function. Its not really the responsibility of the developer to resolve this as his code works fine. I can say... I have experienced this before and I chose to rebuild my machine vs spend countless hours troubleshooting.

Active: inactive (dead)

1a1a11a commented 6 years ago

@fireheadman Thank you!

● lightdm.service - Light Display Manager
   Loaded: loaded (/lib/systemd/system/lightdm.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/display-manager.service.d
           └─xdiagnose.conf
   Active: active (running) since Mon 2018-04-09 22:02:48 EDT; 10s ago
     Docs: man:lightdm(1)
  Process: 3872 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=exited, status=0/SUCCESS)
 Main PID: 3876 (lightdm)
   CGroup: /system.slice/lightdm.service
           β”œβ”€3876 /usr/sbin/lightdm
           └─3883 /usr/lib/xorg/Xorg -core :1 -seat seat0 -auth /var/run/lightdm/root/:1 -nolisten tcp vt7 -novtswitch

Apr 09 22:02:48 asrock systemd[1]: Starting Light Display Manager...
Apr 09 22:02:48 asrock systemd[1]: Started Light Display Manager.

the lightdm problem is solved, but the script gives

Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused

ERROR: The control display is undefined; please run `nvidia-settings --help` for usage information.
fireheadman commented 6 years ago

Unsure why your lightdm is using :1 vs :0 so try your full cmd and use :1 instead

YOU └─3883 /usr/lib/xorg/Xorg -core :1 -seat seat0 -auth /var/run/lightdm/root/:1 -nolisten tcp vt7 -novtswitch

ME β”œβ”€1165 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch

Try this sudo DISPLAY=:1 XAUTHORITY=/var/run/lightdm/root/:1 nvidia-settings -a "[fan-1]/GPUTargetFanSpeed=70"

1a1a11a commented 6 years ago

weird, it stuck at the command without giving error or effect. Thank you for all your help! I really appreciate it!