andru-kun / wildrig-multi

multi algo miner for AMD, NVIDIA and Intel gpu's
251 stars 50 forks source link

Wildrig multi reports wrong GPU temperature and fan status in HiveOS #111

Open PSLLSP opened 2 years ago

PSLLSP commented 2 years ago

WildRig/0.31.2

I have HiveOS rig with two NVidia GPUs and I tried to mine RTM (ghostrider algo) with Wildrig miner.

The status of the first GPU is reported correctly in HiveOS, at least it seems OK, values are in expected range (maybe that those are values for the second GPU card, I am not sure...). The second GPU reports temperature 0C and fan speed is err. This is wrong.

I checked with nvidia-cli and that tool reports temperature and fan speed correctly, no problem.

I checked wildrig log file and it reports temperature and speed for both GPU cards in the log file, I do not see problem at first sight.

I assume the problem is in HiveOS, some script parses log file incorrectly. The key question is, why they have to parse the log file when Wildrig miner support API with JSON data. The problem is that there is no information about GPU temperature or fan speed in the API JSON output... Could be this improved?

I run gminer most of the time on that rig, and gminer reports all important details to API output and HiveOS reports correct values to the user dashboard... I tried to run Wildrig miner and I am disappointed with the status of dashboard in HiveOS for Wildrig miner...

# /hive/miners/wildrig-multi/0.31.2/wildrig-multi --print-devices
GPU #0: NVIDIA GeForce GTX 1660 (busID: 2, arch: sm75, cu: 22, mem: 5944Mb)
GPU #1: NVIDIA GeForce GTX 1660 SUPER   (busID: 3, arch: sm75, cu: 22, mem: 5944Mb)
# ansi2txt < /var/log/miner/wildrig-multi/wildrig-multi.log
...
[20:00:54] Uptime: 0 days 01:01:39
[20:00:54] GPU #0[T:67C F:54%]: 10s: 555 60s: 555 15m: 509 H/s
[20:00:54] GPU #1[T:67C F:57%]: 10s: 561 60s: 562 15m: 515 H/s
[20:00:54] hashrate: 10s: 1117 60s: 1117 15m: 1025 H/s max: 2062 H/s
...

API report is accessible with curl localhost:60060


UPDATE:

Once I reported this issue I changed configuration of my rig several times. And then I tried to mine with Wildrig miner again and this time, HiveOS reported status of both GPUs correctly! It seems like I just hit some random error when one GPU card was not reported correctly. Anyway, try to improve your API report, add state of temperature and fan to the output JSON, it is easier to parse JSON than to search for values in a log file...

andru-kun commented 2 years ago

HiveOS uses its own monitoring for wildrig as I remember, because wildrig doesn't report that info to API(historical reasons :)). Strange that you have trouble with that, first time I see such a problem. Later I will extend API, but this will take time. Anyway, thanks for reporting.

bagyanugraha commented 2 years ago

This is same as my problem. I have the same problem. Kindly @andru-kun support to fix this, but I hope you do not set back to the low hashrate for Wildrig multi. This Beta version gives me high hashrate than previous version; thank you for that!

bagyanugraha commented 2 years ago

@andru-kun and after that error message from Hive OS, I got message from the miner on the monitor screen. It said "cannot find devices"

andru-kun commented 2 years ago

@bagyanugraha does previous versions work fine for you? Also have you tried to lower powerlimit for gpu's?

bagyanugraha commented 2 years ago

@andru-kun I had tried 0.31.2 version, and it was fine. Stable, but I got low hashrate than 0.31.3 beta version. I had tried also low PL, which 230 for RTX3080 (I use it in Ethash) from 245, and 130 for RTX3070 from 145, and I keep getting that error message.

andru-kun commented 2 years ago

@bagyanugraha 0.31.3 uses higher intensity than 0.31.2, that's why you get higher hashrate but this leads to higher GPU load, and probably some instability on some rigs. Try parameter --opencl-launch 22, if it stable - increate it to 23 and so on, up to 26(since 27 is default and you get crashes).

bagyanugraha commented 2 years ago

@andru-kun Thank you. My next question: how to set different intensity for each GPU in my rig? I have 6 GPU in my rig. --opencl-launch=IxW,IxW,IxW,IxW,IxW,IxW ?

andru-kun commented 2 years ago

yep, comma separation will work