genesismining / sgminer-gm

A multi-algo GPU miner
GNU General Public License v3.0
339 stars 146 forks source link

auto-fan fails to control fan, sets GPU0 to 0% #38

Open BruceBuckland opened 7 years ago

BruceBuckland commented 7 years ago

Errors in debug log, and auto fan fails to manage fan-speeds on amdgpu-pro 16.50 and 16.40. Fan of GPU 0 goes to 0 and the rest of the fans fail to change regardless of gpu-temp.

Debug log contains: [00:41:01] Failed to open /sys/bus/pci/devices/0000:01:00.0/hwmon/hwmon0/fan1_input [00:41:01] Failed to open /sys/bus/pci/devices/0000:02:00.0/hwmon/hwmon1/fan1_input [00:41:01] Failed to open /sys/bus/pci/devices/0000:03:00.0/hwmon/hwmon2/fan1_input [00:41:01] Failed to open /sys/bus/pci/devices/0000:04:00.0/hwmon/hwmon3/fan1_input [00:41:01] Failed to open /sys/bus/pci/devices/0000:05:00.0/hwmon/hwmon4/fan1_input [00:41:01] Failed to open /sys/bus/pci/devices/0000:08:00.0/hwmon/hwmon5/fan1_input [00:41:01] Failed to open /sys/bus/pci/devices/0000:0a:00.0/hwmon/hwmon6/fan1_input

On my system (Ubuntu 16.04) the sysfs node should be /sys/bus/pci/devices/0000:0a:00.0/hwmon/hwmon6/pwm1

I don't know about when the drivers are working on other gpu's but that is the correct name for RX470 and RX480 gpu's.

Does auto-fan require ADL?

Note: I tried changing fan1_input to pwm1 in the code, and recompiling. This fixed the log errors, and changed the display so that it reports RPM (but the RPM it reports is a number that is stored in the pwm1 (which is between 0 and 255 representing fan speed). So that is not right. I looked at the code and it appears that the calculation of fan_speed_to_percent and fan_percent_to_speed is correct, but I think it must be that in the past the min-max and fan1_input were RPM's maybe (?) on another driver set. On amdgpu-pro in linux they are 0-255.

In any case with that change, the fan speeds change but with 2 problems.
1) The fan on GPU0 still slams to 0% and stays there even as the unit gets hot 2) the fans on the others do not increase enough to cool down the units they are controlling.

I will point out that in the new pwm1 fan control process the change in value must be at least 9 to actually change the fan speed... or else it just stays there. It could be that the 5% adjustments built into sgminer-gm are not enough to move the needle, but I have had a chance to test that yet.

(Has anyone got auto-fan working? I have external fan control code but I suspect the changes cause problems when simultaneous changes are being made by sgminer - so I am trying to get sgminer auto-fan to work)

OhGodAGirl commented 7 years ago

Can you please tell me what kernel version you're running? It will not work correctly on kernels below 4.9.

cronyx commented 7 years ago

Same problem. Kernel Linux test 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux (ubuntu 16.04)

DominiLux commented 7 years ago

@OhGodAGirl it is a logic error in the code. The linux standard is now using a symlink address for these directories as they are not static and when new versions of the kernel driver come out they are deffinitley subject to change. The symlink addresses are always located at /sys/class/drm/card0 ../card1 ../card2 ../card3 ....... and so on. That symlink points you to the cards root directory from there the normal directory structor of /hwmon/...... continues. I noticed it the other day and was going to push an update. It's a change to one line in the sysfs c file within a loop that should be updated to /sys/class/drm/card# to work correctly. Of course it may need to scan the /sys/class/drm directory first to scan for the number of files it finds using a simple string find routiene, storing the results in an integer (We can name it int i;) then doing a for loop to loop through the detected cards within the symlink directory. Of course to save on resources you would only want to do the hardware scanning loop once so I would deffinitley store the detected folders into a char* array because memory is more abundant on a mining rig than CPU clock cycles. If you dont have time let me know and I'll make the changes and push an update you can commit.

@cronyx to temporarily solve your problem until a fix is pushed I have an old .sh script you can use that I wrote. It will simply set the fans to a static speed but atleast you will have some degree of control over them and then you can run the miner with autofan=false to avoid any issues with the software constantly trying to set fan speeds that it cant. Just check out my github repo and yull find it in there but please ignore my mess of a repository. I have a lot of stuff I've been meaning to upload there but I have not had the time.

cronyx commented 7 years ago

@DominiLux, thanks! I try it.