Closed 1a1a11a closed 6 years ago
This method requires nvidia's proprietary driver to be installed and running. Just download the driver, and run it on startup with this script. I haven't really tried it headless, but in theory this should yes.
As a side note, do you have coolbits enabled?
thank you for quick replying! It doesn't work with headless node, but I found this one which works https://github.com/FedoraTipper/Nvidia-Fan-Curve---Linux
I'm late to this conversation, just stumbled across this script... which is excellent to start off and build more functionality into it...
Yes, It can run in headless...
Modify the calls like this:
#Get GPU temperature
gputemp=`DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings \
-q GPUCoreTemp |awk -F ":" 'NR==2{print $3}' |sed 's/[^0-9]*//g'`
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings \
-a "[fan-0]/GPUTargetFanSpeed=${newfanspeed}" 2>&1 >/dev/null \
-a "[fan-1]/GPUTargetFanSpeed=${newfanspeed}" 2>&1 >/dev/null \
This works for me...
if the developer is still active on this... would be nice to have it cycle through each GPU and place fan speeds individually
Hey,
Thanks for the info, I'll look into this when I have time this week. As for cycling, that can be done. Thanks for the suggestion. Also if you want to create a PR, you are more than welcome.
I am still having trouble, it gives me
Failed to connect to Mir: Failed to connect to server socket: No such file or directory Unable to init server: Could not connect: Connection refused
ERROR: The control display is undefined; please run
nvidia-settings --helpfor usage information.
@1a1a11a https://github.com/FedoraTipper/Nvidia-Fan-Curve-Linux/pull/3 enable the headless mode
ERROR: Error assigning value 70 to attribute 'GPUTargetFanSpeed' (asrock:0[fan:1]) as specified in assignment '[fan-1]/GPUTargetFanSpeed=70' (Unknown Error).
I'm liking the new edition to the script!!! if you want a complete headless mode... you can edit last like to look like this
done &
This will run the script in the background. Another item of interest.... You could get this into systemd as a startup service (aka fancurve.service) in /lib/systemd/system/fancurve.service (then restart systemd daemon and enable) OR as I am attempting... wanting there to be a series of scripts for my XMR.service. Would have others setup for other currencies... like ETH, ZEC...etc. Would just disable one and enable the other.
Would be something like this: (rough draft, this doesn't work as of yet)
[Unit]
Description=xmr
After=network.target
[Service]
ExecStartPre=/home/fireheadman/scripts/set_overclock.sh
ExecStartPre=/home/fireheadman/scripts/set_fancurve.sh
ExecStart=/home/fireheadman/miners/xmr-stak/xmr-stak.sh
User=root
[Install]
WantedBy=multi-user.target
@1a1a11a Are you running the script as root? or using sudo
for example:
WITHOUT ROOT/SUDO
fireheadman@clauneck:~/scripts$ DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q GPUCurrentFanSpeedRPM | grep fan | awk '{ print "RPMs ",$3, $4 }'
No protocol specified
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
ERROR: The control display is undefined; please run `nvidia-settings --help` for usage information.
...AND WITH SUDO (ROOT)
fireheadman@clauneck:~/scripts$ sudo !!
sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q GPUCurrentFanSpeedRPM | grep fan | awk '{ print "RPMs ",$3, $4 }'
RPMs (clauneck:0[fan:0]): 2876.
RPMs (clauneck:0[fan:1]): 998.
RPMs (clauneck:0[fan:2]): 2880.
RPMs (clauneck:0[fan:3]): 996.
fireheadman@clauneck:~/scripts$
@1a1a11a how many gpu you have ? What is your driver version ? @fireheadman I have to test
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/${USER}/:0
On my server the root is the only one user and I totally forgot for the others....
Thank @Neo2SHYAlien for the work π, and @fireheadman for the research.
As what @fireheadman stated about system startup, I'm planning on working on an installation script to streamline the install process. As a proposed solution would be to build the script for multiple startup daemons, not all systems will have systemd.
@1a1a11a It looks like the script is looping through your motherboard fan headers. We might need to readdress fan iteration in the script.
Misclicked close ticket. My bad.
Thank you for your help! @fireheadman @Neo2SHYAlien @FedoraTipper, I am running Ubuntu 16.04 with sudo and I have two GPUs on.
sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q GPUCurrentFanSpeedRPM | grep fan | awk '{ print "RPMs ",$3, $4 }'
gives correct result.
but directly running FanCurveScript.sh
will give the error shown in the previous posts, is the way I used it wrong?
@1a1a11a I believe you need sudo to call the script... so "sudo ./FanCurveScript.sh"
@Neo2SHYAlien If you are running purely as root user (not recommended for many reasons)... then no need to have the ${user} ...just have that as "root"
@fireheadman in case of non root user I have to check the XAUTHORITY path for it. I didn't test it. Unfortunately my server is down till Sunday when I'll have to check it. @1a1a11a can you please test the script with sudo if don't work please start the script in this way sudo bash -x ./FanCurveScript.sh and provide us with the output of it.
@fireheadman @Neo2SHYAlien
Yeah, I used sudo
and get the error.
headless=true
verbose=false
'[' true = true ']'
export DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0
DISPLAY=:0
XAUTHORITY=/var/run/lightdm/root/:0
nvidia-settings -a GPUFanControlState=1
true
i=0 ++ nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader
for gputemp in '$(nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader)'
'[' false = true ']'
case "${gputemp}" in
newfanspeed=70
nvidia-settings -a '[fan-0]/GPUTargetFanSpeed=70'
ERROR: Error assigning value 70 to attribute 'GPUTargetFanSpeed' (asrock:0[fan:0]) as specified in assignment '[fan-0]/GPUTargetFanSpeed=70' (Unknown Error).
ERROR: Error assigning value 70 to attribute 'GPUTargetFanSpeed' (asrock:0[fan:1]) as specified in assignment '[fan-1]/GPUTargetFanSpeed=70' (Unknown Error).
@1a1a11a I'm a little unsure how all the output you posted is being used.... are you running each line of the script? When posting, please use the "< >" for CODE tags to preserve formatting.
Something you may want to try is appending DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0
to your nvidia-settings command(s) So it would be:
DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings \
-a "[fan-1]/GPUTargetFanSpeed=70"
Also note, I used double quotes vs single quotes.
Hi @fireheadman, I tried
sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -a "[fan-1]/GPUTargetFanSpeed=70"
and get the following error
ERROR: Error assigning value 70 to attribute 'GPUTargetFanSpeed' (asrock:0[fan:1]) as specified in assignment '[fan-1]/GPUTargetFanSpeed=70' (Unknown Error).
bummer... not really sure what to do at this point other than give you the canned advice. You might need to reinstall your OS (I've had to do that about 7 times to get this far, but I was the culprit in all my learning mistakes). Or seek out a nvidia based forum(s) for further assistance. This command is not exclusive to this project (the developer did not create it), so it would be at the discretion of this project developer to further assist you. Unless someone else views this issue and has a solution.
I wish you luck in resolving this.
@1a1a11a Which display manager are you using?
XAUTHORITY=/var/run/lightdm/root/:0
Will only work if lightdm is your set display manager.
Also when you
echo $DISPLAY
Does the value correspond to DISPLAY={number}?
Could you perhaps try running without relying on a DM. Make sure X server isn't in use, or else this won't work.
sudo xinit nvidia-settings -a '[fan-1]/GPUTargetFanSpeed=75' -- :0 -once
@fireheadman and @FedoraTipper thank you for your detailed helps!
echo $DISPLAY
gives empty string
I noticed that X is running, so I killed it and when I run the same command sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -a "[fan-1]/GPUTargetFanSpeed=70"
, I got a different error this time.
Failed to connect to Mir: Failed to connect to server socket: No such file or directory Unable to init server: Could not connect: Connection refused
ERROR: The control display is undefined; please run
nvidia-settings --helpfor usage information.
@1a1a11a
Lets try this... There are many options for Xserver configurations. At the least, if you are running headless, then I would assume you are running Ubuntu Server and you only have lightdm running. If not, then you must have some other configuration you can explain/describe?
post output from these commands, should look like below
/usr/sbin/lightdm -v
sudo systemctl is-enabled lightdm
sudo systemctl status lightdm
fireheadman@clauneck:~$ /usr/sbin/lightdm -v
lightdm 1.18.3
fireheadman@clauneck:~$ sudo systemctl is-enabled lightdm
enabled
fireheadman@clauneck:~$ sudo systemctl status lightdm
β lightdm.service - Light Display Manager
Loaded: loaded (/lib/systemd/system/lightdm.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2018-04-08 13:10:21 MDT; 1 day 5h ago
Docs: man:lightdm(1)
Process: 1119 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=ex
Main PID: 1138 (lightdm)
Tasks: 5
Memory: 51.0M
CPU: 41min 56.200s
CGroup: /system.slice/lightdm.service
ββ1138 /usr/sbin/lightdm
ββ1165 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
ββ1330 lightdm --session-child 12 19
@fireheadman
jason@asrock:~$ /usr/sbin/lightdm -v
lightdm 1.18.3
jason@asrock:~$ sudo systemctl is-enabled lightdm
enabled
jason@asrock:~$ sudo systemctl status lightdm
β lightdm.service - Light Display Manager
Loaded: loaded (/lib/systemd/system/lightdm.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/display-manager.service.d
ββxdiagnose.conf
Active: inactive (dead) (Result: exit-code) since Mon 2018-04-09 15:20:50 EDT; 6h ago
Docs: man:lightdm(1)
Process: 12356 ExecStart=/usr/sbin/lightdm (code=exited, status=1/FAILURE)
Process: 12353 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=exited, status=0/SUCCESS)
Main PID: 12356 (code=exited, status=1/FAILURE)
Apr 09 15:20:49 asrock systemd[1]: lightdm.service: Main process exited, code=exited, status=1/FAILURE
Apr 09 15:20:49 asrock systemd[1]: lightdm.service: Unit entered failed state.
Apr 09 15:20:49 asrock systemd[1]: lightdm.service: Triggering OnFailure= dependencies.
Apr 09 15:20:49 asrock systemd[1]: lightdm.service: Failed with result 'exit-code'.
Apr 09 15:20:50 asrock systemd[1]: lightdm.service: Service hold-off time over, scheduling restart.
Apr 09 15:20:50 asrock systemd[1]: Stopped Light Display Manager.
Apr 09 15:20:50 asrock systemd[1]: lightdm.service: Start request repeated too quickly.
Apr 09 15:20:50 asrock systemd[1]: Failed to start Light Display Manager.
jason@asrock:~$ sudo lightdm
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
/etc/modprobe.d is not a file
update-alternatives: error: no alternatives for x86_64-linux-gnu_gfxcore_conf
I have tried to reinstall lightdm, but it didn't work, do you think I should reboot?
@1a1a11a You just posted your problem.. Your display manager is not active. You will need to resolve this issue and then the commands will function. Its not really the responsibility of the developer to resolve this as his code works fine. I can say... I have experienced this before and I chose to rebuild my machine vs spend countless hours troubleshooting.
Active: inactive (dead)
@fireheadman Thank you!
β lightdm.service - Light Display Manager
Loaded: loaded (/lib/systemd/system/lightdm.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/display-manager.service.d
ββxdiagnose.conf
Active: active (running) since Mon 2018-04-09 22:02:48 EDT; 10s ago
Docs: man:lightdm(1)
Process: 3872 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=exited, status=0/SUCCESS)
Main PID: 3876 (lightdm)
CGroup: /system.slice/lightdm.service
ββ3876 /usr/sbin/lightdm
ββ3883 /usr/lib/xorg/Xorg -core :1 -seat seat0 -auth /var/run/lightdm/root/:1 -nolisten tcp vt7 -novtswitch
Apr 09 22:02:48 asrock systemd[1]: Starting Light Display Manager...
Apr 09 22:02:48 asrock systemd[1]: Started Light Display Manager.
the lightdm problem is solved, but the script gives
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
ERROR: The control display is undefined; please run `nvidia-settings --help` for usage information.
Unsure why your lightdm is using :1 vs :0 so try your full cmd and use :1 instead
YOU
ββ3883 /usr/lib/xorg/Xorg -core :1 -seat seat0 -auth /var/run/lightdm/root/:1 -nolisten tcp vt7 -novtswitch
ME
ββ1165 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
Try this
sudo DISPLAY=:1 XAUTHORITY=/var/run/lightdm/root/:1 nvidia-settings -a "[fan-1]/GPUTargetFanSpeed=70"
weird, it stuck at the command without giving error or effect. Thank you for all your help! I really appreciate it!
It seems it requires a display, is that possible to run without a monitor?