desbma / hddfancontrol

Regulate fan speed according to hard drive temperature
GNU General Public License v3.0
137 stars 18 forks source link

string <-> integer conversion error #37

Closed akrea closed 2 years ago

akrea commented 2 years ago

Hi

First thank you for your great work! I would like to use hddfancontrol for my test-bench-like server setup, where I have 2 drive cages with 4 drives each and a fan for each cage. I managed to set everything up and the system is working in terms of keeping my drives cool. Yet not quite in an intended way:

OS: Ubuntu 20.04 Installation method: pip3

Problem: I use sudo hddfancontrol -d /dev/sda /dev/sdc /dev/sde -p /sys/class/hwmon/hwmon1/pwm4 --pwm-start-value 200 --pwm-stop-value 45 --min-fan-speed-prct 10 -i 60 --spin-down-time 7200 -b -l /var/log/hddfancontrol.log to start fancontrol. The log looks like this

2022-05-01 19:42:05,040 INFO [Main] Process real time scheduler set to 2, priority 49
2022-05-01 19:42:05,043 INFO [sda Samsung SSD 883 DCT 480GB] Drive does not support native drivetemp temp query
2022-05-01 19:42:05,076 WARNING [sda Samsung SSD 883 DCT 480GB] Drive does not support HGST temp query
2022-05-01 19:42:05,076 INFO [sda Samsung SSD 883 DCT 480GB] Will probe temperature with method HDDTEMP_INVOCATION
2022-05-01 19:42:05,081 INFO [sdc ST10000NE0008-2JM101] Drive does not support native drivetemp temp query
2022-05-01 19:42:05,120 WARNING [sdc ST10000NE0008-2JM101] Drive does not support HGST temp query
2022-05-01 19:42:05,120 INFO [sdc ST10000NE0008-2JM101] Will probe temperature with method HDDTEMP_INVOCATION
2022-05-01 19:42:05,125 INFO [sde ST10000NE0008-2JM101] Drive does not support native drivetemp temp query
2022-05-01 19:42:05,156 WARNING [sde ST10000NE0008-2JM101] Drive does not support HGST temp query
2022-05-01 19:42:05,156 INFO [sde ST10000NE0008-2JM101] Will probe temperature with method HDDTEMP_INVOCATION
2022-05-01 19:42:05,182 INFO [Main] Maximum device temperature: 34 °C
2022-05-01 19:42:05,182 INFO [Fan #1] Setting fan speed to 20%
2022-05-01 19:42:05,192 WARNING [Fan #1] /sys/class/hwmon/hwmon1/pwm4_enable was 0, setting it to 1
2022-05-01 19:42:25,219 INFO [Main] Maximum device temperature: 34 °C
2022-05-01 19:42:45,231 ERROR [Main] ValueError: invalid literal for int() with base 10: ''
2022-05-01 19:42:45,231 INFO [Fan #1] Setting fan speed to 100%
2022-05-01 19:42:45,231 INFO [DriveSpinDownThread-sdc ST10000NE0008-2JM101] Exiting
2022-05-01 19:42:45,232 INFO [DriveSpinDownThread-sda Samsung SSD 883 DCT 480GB] Exiting
2022-05-01 19:42:45,232 INFO [DriveSpinDownThread-sde ST10000NE0008-2JM101] Exiting

You can see the error message and the fans being thereafter set to a 100% (which is confirmed by my system monitoring).

Somewhere in the code seems to be a conversion error according to this thread.

Would you be able to fix this one or am I doing something wrong?

Thank you!

desbma commented 2 years ago

The third temperature query for one of your drives fails with hddtemp, it may be due to some power saving feature.

You can try running with --smartctl and or/paste the log with -v debug to get more info about what is going on.

akrea commented 2 years ago

Indeed this may be.

noname@skippy:~$ sudo hddtemp /dev/sd[abcdefg]
/dev/sda: Samsung SSD 883 DCT 480G B              �: 39°C
/dev/sdb: Samsung SSD 883 DCT 480G B              �: 42°C
/dev/sdc: ST10000NE0008-2JM101: drive is sleeping
/dev/sdd: ST10000NE0008-2JM101: drive is sleeping
/dev/sde: ST10000NE0008-2JM101: drive is sleeping
/dev/sdf: ST10000NE0008-2JM101: drive is sleeping
/dev/sdg: ST20000NM007D-3DJ103: 36°C

The ones going to sleep are HDDs for storage. So in order to avoid the errors I assigned the SSDs (one in each cage) to the fans. They do not go to sleep as they are OS-disks in RAID 1. Moreover, those are the "hottest" drives most of the time and do not differ much from the others (5°C max.).

Suggestion: How about returning a WARNING instead of an error if the conversion is not possible.

Question about inner workings: Am I correct that -d /dev/sda /dev/sdc -p /sys/class/hwmon/hwmon1/pwm4 .... will assign sda and sdc to pwm4 an the "hotter" drive determines fan speed?

desbma commented 2 years ago

The drive is sleeping output is correctly handled in hddfancontrol, this is not the cause of the error you are having.

If you restart the daemon with -v debug, that would allow narrowing down the device that cause the error.

Suggestion: How about returning a WARNING instead of an error if the conversion is not possible.

In a system where cooling is controlled by the output of a sensor, ignoring a sensor error is possibly the worst thing you can do. I think it is much better to throw an explicit error, and set the fan at 100% for safety.

Question about inner workings: Am I correct that -d /dev/sda /dev/sdc -p /sys/class/hwmon/hwmon1/pwm4 .... will assign sda and sdc to pwm4 an the "hotter" drive determines fan speed?

Yes