0.14.0rc3 - Sometimes 1 GPU has 0 mh/s without errors/warning before reworking normally

Rodriguevb commented 6 years ago

Like my gpu/6, sometimes it stops mining without errors and warning.

i use default parameters: ethminer -S eth-eu2.nanopool.org:9999 -O <address>.<minername>:<email> --cuda

What is the cause? This occurs on 3 same computers

m  13:40:11|ethminer|  Speed 410.25 Mh/s    gpu/0 31.56  gpu/1 31.56  gpu/2 31.65  gpu/3 31.56  gpu/4 31.56  gpu/5 31.56  gpu/6 31.56  gpu/7 31.56  [A13+0:R0+0:F0] Time: 00:07
m  13:40:16|ethminer|  Speed 410.29 Mh/s    gpu/0 31.56  gpu/1 31.56  gpu/2 31.56  gpu/3 31.56  gpu/4 31.56  gpu/5 31.56  gpu/6 31.56  gpu/7 31.56  [A13+0:R0+0:F0] Time: 00:07
m  13:40:21|ethminer|  Speed 398.22 Mh/s    gpu/0 31.56  gpu/1 31.56  gpu/2 31.56  gpu/3 31.56  gpu/4 31.56  gpu/5 31.56  gpu/6 19.58  gpu/7 31.56  [A13+0:R0+0:F0] Time: 00:07
ℹ  13:40:21|stratum |  Received new job #db265b17… from eth-eu2.nanopool.org
ℹ  13:40:23|stratum |  Received new job #3171826b… from eth-eu2.nanopool.org
m  13:40:26|ethminer|  Speed 377.71 Mh/s    gpu/0 31.37  gpu/1 31.37  gpu/2 31.37  gpu/3 31.37  gpu/4 31.27  gpu/5 31.37  gpu/6  1.47  gpu/7 31.37  [A13+0:R0+0:F0] Time: 00:07
m  13:40:31|ethminer|  Speed 376.04 Mh/s    gpu/0 31.27  gpu/1 31.27  gpu/2 31.37  gpu/3 31.37  gpu/4 31.37  gpu/5 31.27  gpu/6  0.00  gpu/7 31.37  [A13+0:R0+0:F0] Time: 00:07
ℹ  13:40:32|cuda-8  |  Nonce 0x45a06f8f0701695a submitted to eth-eu2.nanopool.org
ℹ  13:40:32|stratum |  **Accepted  in 32 ms.
m  13:40:36|ethminer|  Speed 378.64 Mh/s    gpu/0 31.56  gpu/1 31.56  gpu/2 31.56  gpu/3 31.56  gpu/4 31.56  gpu/5 31.56  gpu/6  0.00  gpu/7 31.56  [A14+0:R0+0:F0] Time: 00:07
m  13:40:41|ethminer|  Speed 378.69 Mh/s    gpu/0 31.56  gpu/1 31.56  gpu/2 31.56  gpu/3 31.56  gpu/4 31.56  gpu/5 31.56  gpu/6  0.00  gpu/7 31.56  [A14+0:R0+0:F0] Time: 00:07
m  13:40:46|ethminer|  Speed 378.57 Mh/s    gpu/0 31.56  gpu/1 31.56  gpu/2 31.56  gpu/3 31.56  gpu/4 31.56  gpu/5 31.47  gpu/6  0.00  gpu/7 31.56  [A14+0:R0+0:F0] Time: 00:07
ℹ  13:40:49|stratum |  Received new job #ff767c3d… from eth-eu2.nanopool.org
m  13:40:51|ethminer|  Speed 377.66 Mh/s    gpu/0 31.47  gpu/1 31.47  gpu/2 31.47  gpu/3 31.47  gpu/4 31.47  gpu/5 31.47  gpu/6  0.00  gpu/7 31.56  [A14+0:R0+0:F0] Time: 00:08
ℹ  13:40:52|cuda-2  |  Nonce 0x45a0698efa741822 submitted to eth-eu2.nanopool.org
ℹ  13:40:52|stratum |  **Accepted  in 36 ms.

...

m  13:45:01|ethminer|  Speed 377.63 Mh/s    gpu/0 31.47  gpu/1 31.47  gpu/2 31.47  gpu/3 31.47  gpu/4 31.47  gpu/5 31.47  gpu/6  0.00  gpu/7 31.47  [A27+0:R0+0:F0] Time: 00:12
m  13:45:06|ethminer|  Speed 379.47 Mh/s    gpu/0 31.43  gpu/1 31.43  gpu/2 31.43  gpu/3 31.50  gpu/4 31.50  gpu/5 31.50  gpu/6  1.96  gpu/7 31.43  [A27+0:R0+0:F0] Time: 00:12
m  13:45:11|ethminer|  Speed 391.93 Mh/s    gpu/0 31.62  gpu/1 31.54  gpu/2 31.54  gpu/3 31.54  gpu/4 31.62  gpu/5 31.54  gpu/6 13.26  gpu/7 31.54  [A27+0:R0+0:F0] Time: 00:12
m  13:45:16|ethminer|  Speed 403.08 Mh/s    gpu/0 31.55  gpu/1 31.55  gpu/2 31.55  gpu/3 31.55  gpu/4 31.55  gpu/5 31.55  gpu/6 24.50  gpu/7 31.55  [A27+0:R0+0:F0] Time: 00:12
m  13:45:21|ethminer|  Speed 410.18 Mh/s    gpu/0 31.55  gpu/1 31.55  gpu/2 31.55  gpu/3 31.55  gpu/4 31.55  gpu/5 31.55  gpu/6 31.55  gpu/7 31.55  [A27+0:R0+0:F0] Time: 00:12
ℹ  13:45:25|cuda-1  |  Nonce 0x45a0688f321e63a4 submitted to eth-eu2.nanopool.org
ℹ  13:45:25|stratum |  **Accepted  in 25 ms.
ℹ  13:45:25|stratum |  Received new job #b5f16b17… from eth-eu2.nanopool.org
m  13:45:26|ethminer|  Speed 409.20 Mh/s    gpu/0 31.48  gpu/1 31.48  gpu/2 31.48  gpu/3 31.48  gpu/4 31.48  gpu/5 31.48  gpu/6 31.48  gpu/7 31.48  [A28+0:R0+0:F0] Time: 00:12
m  13:45:31|ethminer|  Speed 409.25 Mh/s    gpu/0 31.47  gpu/1 31.47  gpu/2 31.47  gpu/3 31.47  gpu/4 31.47  gpu/5 31.47  gpu/6 31.47  gpu/7 31.47  [A28+0:R0+0:F0] Time: 00:12

Rodriguevb commented 6 years ago

Almost the same error as #824 but the gpu is cold and there is a calculated hashrate loss

smurfy commented 6 years ago

gpu is cold

does this mean the card is actually not mining at all? Does the wattage also drop or do you see a dip in the actual hashrate?

Or is this only a display issue?

Also the error i found with hashrate can be zero / stuck is only in special circumstances and will NEVER recover. And it actually was a problem for the overall counting not for just one card.

Since the hash counting is in the code loop also displaying switch time after new work arrived, could you please up your verbose level (-v 9 i think) to display the switch times? Maybe the card has a hickup while switching work, and somehow recovers.

Rodriguevb commented 6 years ago

It's not only a display issue, when the gpu has 0.0 hashrate, it do nothing. So it cools for a little time. And it's not a temperature safety

cu 17:58:17|cuda-11 | Switch time 13124 ms.

m  17:57:55|ethminer|  Speed 407.99 Mh/s    gpu/0 31.39  gpu/1 31.39  gpu/2 31.39  gpu/3 31.39  gpu/4 31.39  gpu/5 31.39  gpu/6 31.39  gpu/7 31.39  gpu/8 31.39  gpu/9 31.39  gpu/10 31.39  gpu/11 31.30  gpu/12 31.39  [A109+0:R0+0:F0] Time: 00:42
m  17:58:00|ethminer|  Speed 404.41 Mh/s    gpu/0 31.39  gpu/1 31.39  gpu/2 31.39  gpu/3 31.39  gpu/4 31.39  gpu/5 31.39  gpu/6 31.39  gpu/7 31.39  gpu/8 31.39  gpu/9 31.39  gpu/10 31.39  gpu/11 27.72  gpu/12 31.39  [A109+0:R0+0:F0] Time: 00:42
ℹ  17:58:04|stratum |  Received new job #ca61ac23… from eth-eu2.nanopool.org
cu  17:58:04|cuda-0  |  Switch time 2 ms.
cu  17:58:04|cuda-2  |  Switch time 4 ms.
cu  17:58:04|cuda-9  |  Switch time 4 ms.
cu  17:58:04|cuda-4  |  Switch time 9 ms.
cu  17:58:04|cuda-5  |  Switch time 13 ms.
cu  17:58:04|cuda-7  |  Switch time 14 ms.
cu  17:58:04|cuda-6  |  Switch time 15 ms.
cu  17:58:04|cuda-1  |  Switch time 16 ms.
cu  17:58:04|cuda-3  |  Switch time 18 ms.
cu  17:58:04|cuda-12 |  Switch time 20 ms.
cu  17:58:04|cuda-10 |  Switch time 22 ms.
cu  17:58:04|cuda-8  |  Switch time 32 ms.
m  17:58:05|ethminer|  Speed 391.17 Mh/s    gpu/0 31.39  gpu/1 31.39  gpu/2 31.39  gpu/3 31.39  gpu/4 31.39  gpu/5 31.39  gpu/6 31.39  gpu/7 31.39  gpu/8 31.39  gpu/9 31.39  gpu/10 31.30  gpu/11 14.69  gpu/12 31.30  [A109+0:R0+0:F0] Time: 00:42
m  17:58:10|ethminer|  Speed 378.03 Mh/s    gpu/0 31.39  gpu/1 31.39  gpu/2 31.39  gpu/3 31.30  gpu/4 31.39  gpu/5 31.39  gpu/6 31.39  gpu/7 31.39  gpu/8 31.39  gpu/9 31.39  gpu/10 31.30  gpu/11  1.57  gpu/12 31.39  [A109+0:R0+0:F0] Time: 00:42
m  17:58:15|ethminer|  Speed 376.54 Mh/s    gpu/0 31.39  gpu/1 31.39  gpu/2 31.39  gpu/3 31.39  gpu/4 31.39  gpu/5 31.39  gpu/6 31.39  gpu/7 31.39  gpu/8 31.39  gpu/9 31.39  gpu/10 31.30  gpu/11  0.00  gpu/12 31.39  [A109+0:R0+0:F0] Time: 00:42
cu  17:58:17|cuda-11 |  Switch time 13124 ms.
m  17:58:20|ethminer|  Speed 385.15 Mh/s    gpu/0 31.48  gpu/1 31.48  gpu/2 31.48  gpu/3 31.48  gpu/4 31.48  gpu/5 31.48  gpu/6 31.48  gpu/7 31.48  gpu/8 31.48  gpu/9 31.48  gpu/10 31.48  gpu/11  7.43  gpu/12 31.48  [A109+0:R0+0:F0] Time: 00:42
ℹ  17:58:21|stratum |  Received new job #792d2f18… from eth-eu2.nanopool.org
cu  17:58:21|cuda-12 |  Switch time 1 ms.
cu  17:58:21|cuda-11 |  Switch time 2 ms.
cu  17:58:21|cuda-10 |  Switch time 2 ms.
cu  17:58:21|cuda-0  |  Switch time 12 ms.
cu  17:58:21|cuda-8  |  Switch time 13 ms.
cu  17:58:21|cuda-2  |  Switch time 14 ms.
cu  17:58:21|cuda-9  |  Switch time 19 ms.
cu  17:58:21|cuda-4  |  Switch time 21 ms.
cu  17:58:21|cuda-5  |  Switch time 26 ms.
cu  17:58:21|cuda-7  |  Switch time 26 ms.
cu  17:58:21|cuda-1  |  Switch time 28 ms.
cu  17:58:21|cuda-6  |  Switch time 28 ms.
cu  17:58:21|cuda-3  |  Switch time 30 ms.
m  17:58:25|ethminer|  Speed 397.70 Mh/s    gpu/0 31.39  gpu/1 31.39  gpu/2 31.39  gpu/3 31.39  gpu/4 31.39  gpu/5 31.39  gpu/6 31.39  gpu/7 31.39  gpu/8 31.39  gpu/9 31.39  gpu/10 31.39  gpu/11 21.07  gpu/12 31.39  [A109+0:R0+0:F0] Time: 00:43
ℹ  17:58:26|cuda-3  |  Nonce 0x0eeca3fab3b5f3c7 submitted to eth-eu2.nanopool.org
ℹ  17:58:26|stratum |  **Accepted  in 44 ms.
m  17:58:30|ethminer|  Speed 407.89 Mh/s    gpu/0 31.38  gpu/1 31.38  gpu/2 31.38  gpu/3 31.38  gpu/4 31.38  gpu/5 31.38  gpu/6 31.38  gpu/7 31.38  gpu/8 31.38  gpu/9 31.38  gpu/10 31.38  gpu/11 31.38  gpu/12 31.30  [A110+0:R0+0:F0] Time: 00:43
ℹ  17:58:30|stratum |  Received new job #d5ca8746… from eth-eu2.nanopool.org
cu  17:58:30|cuda-0  |  Switch time 2 ms.
cu  17:58:30|cuda-8  |  Switch time 4 ms.
cu  17:58:30|cuda-2  |  Switch time 5 ms.
cu  17:58:30|cuda-9  |  Switch time 11 ms.
cu  17:58:30|cuda-4  |  Switch time 12 ms.
cu  17:58:30|cuda-7  |  Switch time 17 ms.
cu  17:58:30|cuda-5  |  Switch time 19 ms.
cu  17:58:30|cuda-1  |  Switch time 19 ms.
cu  17:58:30|cuda-6  |  Switch time 19 ms.
cu  17:58:30|cuda-3  |  Switch time 20 ms.
cu  17:58:30|cuda-12 |  Switch time 24 ms.
cu  17:58:30|cuda-11 |  Switch time 26 ms.
cu  17:58:30|cuda-10 |  Switch time 27 ms.

smurfy commented 6 years ago

Well so its somehow a switch issue. Not sure why. try clocking the card a bit lower.

invidtiv commented 6 years ago

also getting it here.

m 12:12:00|ethminer| Speed 128.95 Mh/s gpu/0 32.24 gpu/1 32.24 gpu/2 32.24 gpu/3 32.24 [A0+0:R0+0:F0] Time: 00:01 m 12:12:05|ethminer| Speed 123.62 Mh/s gpu/0 32.24 gpu/1 32.24 gpu/2 32.24 gpu/3 26.90 [A0+0:R0+0:F0] Time: 00:01 m 12:12:10|ethminer| Speed 111.03 Mh/s gpu/0 32.35 gpu/1 32.35 gpu/2 32.35 gpu/3 13.99 [A0+0:R0+0:F0] Time: 00:01 ℹ 12:12:11|cuda-2 | Nonce 0xfa960efa28dee694 submitted to eu1.ethermine.org ℹ 12:12:11|stratum | Accepted in 36 ms. ℹ 12:12:12|cuda-1 | Nonce 0xfa960dfa2a621fab submitted to eu1.ethermine.org ℹ 12:12:12|stratum | Accepted in 36 ms. m 12:12:15|ethminer| Speed 100.23 Mh/s gpu/0 32.44 gpu/1 32.44 gpu/2 32.36 gpu/3 2.99 [A2+0:R0+0:F0] Time: 00:01 m 12:12:20|ethminer| Speed 97.25 Mh/s gpu/0 32.44 gpu/1 32.44 gpu/2 32.36 gpu/3 0.00 [A2+0:R0+0:F0] Time: 00:01 ℹ 12:12:23|stratum | Received new job #7ecc4f8c… from eu1.ethermine.org m 12:12:25|ethminer| Speed 102.82 Mh/s gpu/0 32.35 gpu/1 32.35 gpu/2 32.35 gpu/3 5.77 [A2+0:R0+0:F0] Time: 00:01 m 12:12:30|ethminer| Speed 116.28 Mh/s gpu/0 32.35 gpu/1 32.35 gpu/2 32.26 gpu/3 19.32 [A2+0:R0+0:F0] Time: 00:01 m 12:12:35|ethminer| Speed 129.30 Mh/s gpu/0 32.35 gpu/1 32.35 gpu/2 32.26 gpu/3 32.35 [A2+0:R0+0:F0] Time: 00:01 m 12:12:40|ethminer| Speed 129.68 Mh/s gpu/0 32.44 gpu/1 32.44 gpu/2 32.36 gpu/3 32.44 [A2+0:R0+0:F0] Time: 00:01

Rodriguevb commented 6 years ago

@DeadManWalkingTO i didn't use --cuda-noeval

akatasonov commented 6 years ago

@Rodriguevb check your syslog/Windows logs for Nvidia driver crashes. Your GPU is likely down because of being overclocked and after an exception in the driver code

Rodriguevb commented 6 years ago

0 errors in /var/log/kern.log

Rodriguevb commented 6 years ago

still with 0.14.0rc1. It happens on several different computers. Same with overclock or not. I use Ubuntu server 17.10, maybe i need to change to 16.04 LTS version?

H05ted commented 6 years ago

It is not an ubuntu issue i have the same on 16.04 lts with different drivers nvidia on different rigs, from version 0.13.5 i`m seenig this trouble, sometimes different gpu not working, only helps to restart rigs sometimes just ethminer, need to monitor mhs. Sometimes it showing mhs but gpu is cold and dont working ((

Rodriguevb commented 6 years ago

@H05ted so an nvidia drivers issue maybe ? i use nvidia-390

H05ted commented 6 years ago

i use different versions of drivers from 384 to 390 and have the same issue

Rodriguevb commented 6 years ago

Which motherboard model do you have?

H05ted commented 6 years ago

TB250-BTC Ver. 6, 7gpu

Rodriguevb commented 6 years ago

Different os, different versions and different motherboards... maybe rizers aren't working well and need to be replaced? For my part, i tried all possible configurations and don't find the problem.

Rodriguevb commented 6 years ago

@H05ted what is your overclock values please? :)

wetblanketcc commented 6 years ago

Seeing the same.

Ubuntu 18.04
ethminer 0.15.0rc2
6x Asus ROG 1070s
Biostar TB250-BTC+ (8 PCIe, 2 slots currently empty)

No visible errors in output, nothing in kernel logs that I can see.

ethminer --report-hashrate --exit --cuda --HWMON 1 --verbosity 5 -P stratum1+ssl://<address>.worker1@us2.ethermine.org:5555 -P stratum1+ssl://<address>.worker1@us1.ethermine.org:5555

The weird thing, is that it does recover, occasionally.

ubuntu@worker1:~/mining/ethminer$ nvidia-smi
Thu Jul  5 22:41:55 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24.02              Driver Version: 396.24.02                 |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    On   | 00000000:01:00.0  On |                  N/A |
| 20%   62C    P2   104W / 105W |   2772MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    On   | 00000000:02:00.0  On |                  N/A |
| 20%   58C    P2   105W / 105W |   2752MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 1070    On   | 00000000:04:00.0  On |                  N/A |
| 20%   57C    P2    46W / 105W |   2752MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 1070    On   | 00000000:05:00.0  On |                  N/A |
| 20%   59C    P2   105W / 105W |   2752MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 1070    On   | 00000000:06:00.0  On |                  N/A |
| 20%   68C    P2   103W / 105W |   2752MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce GTX 1070    On   | 00000000:07:00.0  On |                  N/A |
| 20%   76C    P2   103W / 105W |   2752MiB /  8119MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

wetblanketcc commented 6 years ago

If I lower the OC settings, things become more stable (it still happens, just less frequently) - but I have to wonder (and I'd like input on this from others) if crashing the process would be better than allowing one or more GPUs to become dead weight in the current process.

That would allow monitoring scripts to restart the process and achieve full GPU usage. Ideally, ethminer would detect 0 hashrate GPUs and soft restart it, but I'm not sure if that's possible.

wetblanketcc commented 6 years ago

As mentioned I have 6x1070s, which should reliably deliver 184mhs. When they're running well, the rig does achieve this. But with this bug, I see this:

screen shot 2018-07-08 at 8 13 49 pm

Regular up and down of hashrate, all due to one or more of the GPUs dropping out of the race, and then later joining back up. Older versions of ethminer did not do this. I have no troubles mining equihash.

In case it isn't obvious from the graph, I'm not reaching 180mhs at the pool and all those 'reported' dips are when one/more GPUs drop out.

wetblanketcc commented 6 years ago

Switching to claymore in eth-only mode to check stability of another miner compared to ethminer, will report back.

wetblanketcc commented 6 years ago

screen shot 2018-07-09 at 10 23 20 am

After the switch to Claymore, as you can see it is more consistent with the same overclocking. Its dips in the reported hashrate seem to correlate with when the miner crashed (but then recovered). There is something that can be fixed/addressed in ethminer.

wetblanketcc commented 6 years ago

I believe that ethminer achieves higher GPU hashrate for the same overclocking compared to claymore, but ethminer's detection of zero-hashrate GPUs and their recovery is the problem.

When overclocking, crashes should be expected. As long as recovery is quick, then that's probably all that can be done.

AndreaLanfranchi commented 6 years ago

If crashes occur when overclocking then you're overclocking too much. Full stop.

Expecting the miner to continuosly recover from crashes is a bad expectation as in the long run you're stressing too much your GPU and effectively gaining a lower average hashrate.

akatasonov commented 6 years ago

Here's my overclocked 6x1070 setup running ethminer that achieves 187 Mhz constantly without crashes.

wetblanketcc commented 6 years ago

If crashes occur when overclocking then you're overclocking too much.

@AndreaLanfranchi I would think so as well, except that crashes are a) less frequent with other software and b) more frequent that I had with previous versions of ethminer

Regarding (b), perhaps recent versions of ethminer have improved the efficiency of the calcs or something to improve hashrate, but stresses the GPU more than older versions. Maybe that is an explanation?

Here's my overclocked 6x1070 setup running ethminer that achieves 187 Mhz constantly without crashes.

@akatasonov, that's the graph with about the same consistent numbers I used to enjoy seeing as well. May I ask what your settings are? When I was getting many crashes, my settings were -200/+1050, 105watts, 25% fan - getting a net 186.x MH/s from 6x 1070s.

I've now switched back to ethminer (from my claymore test) and reduced the clock settings to -200/+1000, 102watts, 45% fan to achieve a similar hashrate (just under 184 MH/s) that I was getting from claymore. I'll see how the stability goes, but I would sure like to get back to my 186/187 level I used to see.

AndreaLanfranchi commented 6 years ago

Here is my only 6x Gtx 1070 running stable at 189.4 Mhs I am running latest dev on linux Settings : Watts 102 -200/+1350 Fan 100% Constant temp below 65°

miner

AndreaLanfranchi commented 6 years ago

but I would sure like to get back to my 186/187 level I used to see.

Consider that at each epoch increase dag size increases thus our GPU will get a little bit slower each time.

wetblanketcc commented 6 years ago

@AndreaLanfranchi You must have some top of the line cards. Which ones, if I can ask?

I thought my ROG Strix 1070s were supposed to be good, but as soon as I push past 1050, it all falls apart.

invidtiv commented 6 years ago

Since we are sharing here is the results and settings of my 6xMSI armor 1070 FAN=100 WATT=110 CLOCK=-110 MEM=1300 version 0.15.0.dev11

AndreaLanfranchi commented 6 years ago

I thought my ROG Strix 1070s were supposed to be good, but as soon as I push past 1050, it all falls apart.

Same as yours but with Samsung ram. Maybe yours is with Micron ram

akatasonov commented 6 years ago

@invidtiv FAN=100 is super extreme, your fans might fall off

invidtiv commented 6 years ago

@akatasonov replacing a fan is easier than replacing a GPU... The cooler they run, the better they run... Ambient temp affects stability more than anyother issue. The above machine is running for 188hours non stop, it only was reset because of a power outage, previous to that 286hours...and again another power outage... Sometimes I forget to check the power consuption before turning on the oven in the kitchen...

No GPU above 46ºC.

I do understand the risks for the fans, I have replaced a few, learned the hardway that some fans have bronze bushing instead of ball bearing in the fan core. Those tend to fail, but a 12cm fan usually over the failing fan, does the trick... Until I get a replacement fan...

One major issue that I learned with time is that very card has its minimal wattage to run steady without error, if I lower to much then I get much more stales shares and inconsistent switching times... Normally the temp of the GPU will ditacte how fast you can run it... Specially the mem temp.

@wetblanketcc A crash is always bad it affects how fast you are payed by the pool, everytime you crash you have at least less 50 shares. Lets get real that is your income dropping. I have machines that need constant attention and overlooking, If I have a mcahine that is rebooting or crashing , first step is to check for burnt cables or connector, second step is step down the overclocking 10%. After 10 days I bump it up 3%, and let run ten days. You never gain more with burst mining wich is what you are doing, I have done it also in the past believing that it was better, a few controlled runs and a spreadsheet, I found out otherwise...,

wetblanketcc commented 6 years ago

Same as yours but with Samsung ram. Maybe yours is with Micron ram @AndreaLanfranchi That must be the case, I cannot think of another reason. I'd consider maybe risers, but the rig performs well on other algos.

@invidtiv - Impressive charts. I would never say that I'm jealous, but... 🙄 Also, good advice/feedback, I'll act on it. Thanks!

akatasonov commented 6 years ago

@invidtiv though your comment about the GPU temperature is very valid nowadays its almost impossible to fry a high-grade GPU, even if you run it without coolers at all. Good stuff on the clocks however!

rgaufman commented 6 years ago

@wetblanketcc what command do you use to mine equihash? -- I tried this one but it's throwing an error and dying :( <

$ sudo ./ethminer --opencl-device 0 -G -P stratum2+tcp://3L62DB7RWNTET5EenQYsiqLdWJF4qyX6PW.EquihashMiner@equihash.eu.nicehash.com:3357
 m 23:31:04 ethminer ethminer 0.16.0.dev1
 m 23:31:04 ethminer Build: linux/release
 i 23:31:04 ethminer Found suitable OpenCL device [Ellesmere] with 8,583,593,984 bytes of GPU memory
 i 23:31:04 ethminer Found suitable OpenCL device [Ellesmere] with 8,583,593,984 bytes of GPU memory
 i 23:31:04 ethminer Configured pool equihash.eu.nicehash.com:3357
 i 23:31:04 main     Selected pool equihash.eu.nicehash.com:3357
 i 23:31:04 stratum  Trying 172.65.195.171:3357 ...
 i 23:31:04 stratum  Connected to equihash.eu.nicehash.com [172.65.195.171:3357]
 i 23:31:04 stratum  Spinning up miners...
cl 23:31:04 cl-0     No work. Pause for 3 s.
 X 23:31:06 stratum  Unable to find suitable Stratum Mode
cl 23:31:07 cl-0     No work. Pause for 3 s.
 m 23:31:09 ethminer Speed 0.00 Mh/s gpu0 0.00 [A0] Time: 00:00
cl 23:31:10 cl-0     No work. Pause for 3 s.
cl 23:31:13 cl-0     No work. Pause for 3 s.
 m 23:31:14 ethminer Speed 0.00 Mh/s gpu0 0.00 [A0] Time: 00:00
cl 23:31:16 cl-0     No work. Pause for 3 s.
 m 23:31:19 ethminer Speed 0.00 Mh/s gpu0 0.00 [A0] Time: 00:00
cl 23:31:19 cl-0     No work. Pause for 3 s.
cl 23:31:22 cl-0     No work. Pause for 3 s.
 m 23:31:24 ethminer Speed 0.00 Mh/s gpu0 0.00 [A0] Time: 00:00
cl 23:31:25 cl-0     No work. Pause for 3 s.
cl 23:31:28 cl-0     No work. Pause for 3 s.
 m 23:31:29 ethminer Speed 0.00 Mh/s gpu0 0.00 [A0] Time: 00:00
cl 23:31:31 cl-0     No work. Pause for 3 s.
 m 23:31:34 ethminer Speed 0.00 Mh/s gpu0 0.00 [A0] Time: 00:00
cl 23:31:34 cl-0     No work. Pause for 3 s.
 i 23:31:34 stratum  Connection remotely closed by equihash.eu.nicehash.com
 i 23:31:34 stratum  Trying 172.65.195.171:3357 ...
 i 23:31:34 stratum  Connected to equihash.eu.nicehash.com [172.65.195.171:3357]
 i 23:31:34 stratum  Stratum mode detected : STRATUM
 i 23:31:34 stratum  Subscribed to stratum server
 i 23:31:34 stratum  Authorized worker 3L62DB7RWNTET5EenQYsiqLdWJF4qyX6PW.EquihashMiner
 X 23:31:34 stratum  Got unknown method [mining.set_target] from pool. Discarding ...
 i 23:31:34 stratum  Connection remotely closed by equihash.eu.nicehash.com
 i 23:31:34 main     Disconnected from equihash.eu.nicehash.com [172.65.195.171:3357]
 i 23:31:35 main     No more connections to try. Exiting ...
 i 23:31:35 main     Shutting down miners...
 X 23:31:37 cl-0     OpenCL Error: clFinish: CL_INVALID_COMMAND_QUEUE (-36)
 m 23:31:39 ethminer not-connected
 i 23:31:39 ethminer Terminated !

wetblanketcc commented 6 years ago

@rgaufman I've got a bash script that I run that sets up my environment and overclock settings:

#!/usr/bin/env bash
export XAUTHORITY=/var/run/lightdm/root/:0
export DISPLAY=:0
sudo nvidia-smi -pm 1
sudo nvidia-smi -pl 106
sudo nvidia-settings -a GPUPowerMizerMode=1 -a GPUGraphicsClockOffset[3]=-200 -a GPUMemoryTransferRateOffset[3]=1100 -a GPUFanControlState=1 -a GPUTargetFanSpeed=65
export GPU_FORCE_64BIT_PTR=0
export GPU_MAX_HEAP_SIZE=100
export GPU_USE_SYNC_OBJECTS=1
export GPU_MAX_ALLOC_PERCENT=100
export GPU_SINGLE_ALLOC_PERCENT=100
./ethminer --report-hashrate --exit --cuda --HWMON 1 --verbosity 5 -P stratum1+ssl://0x5ebE6Eac1D7A7Cf009cAC102F223eFDE0127Ca30.rig1@us2.ethermine.org:5555 -P stratum1+ssl://0x5ebE6Eac1D7A7Cf009cAC102F223eFDE0127Ca30.rig1@us1.ethermine.org:5555

ethereum-mining / ethminer

0.14.0rc3 - Sometimes 1 GPU has 0 mh/s without errors/warning before reworking normally #859