firoorg / ccminer

mtp ccminer developpment
GNU General Public License v3.0
103 stars 58 forks source link

ccminer 1.2.2 terminates without error after ~20 minutes of mining on EVGA GTX 1080 Ti SC2 HYBRID #53

Closed nvOC-Stubo closed 4 years ago

nvOC-Stubo commented 5 years ago

I am using ccminer 1.2.2 on a rig with 4 x EVGA GTX 1080 Ti SC2 HYBRID (11G-P4-6598-KR) running Windows 10. Power is set to 75% and clocks are +135/+550. No matter what I try, I cannot mine Zcoin (MTP algo) with these GPUs. The miner will mine along for about 20 minutes and then just die without any error. The logs (excerpts from 3 of them) look like this:

... [2019-09-14 15:51:34] DEBUG: job_id=4437d5d 62040000 xnonce2=0000000000000000 time=05:37:33 [2019-09-14 15:51:35] GPU #2: EVGA GTX 1080 Ti, 3730.89 kH/s [2019-09-14 15:51:35] DEBUG: job_id=4437d5d 62040000 xnonce2=0000000000000000 time=05:37:33 [2019-09-14 15:51:35] DEBUG: job_id=4437d5d 62040000 xnonce2=0000000000000000 time=05:37:33 [2019-09-14 15:51:35] DEBUG: job_id=4437d5d 62040000 xnonce2=0000000000000000 time=05:37:33 [2019-09-14 15:51:36] DEBUG: job_id=4437d5d 62040000 xnonce2=0000000000000000 time=05:37:33 [2019-09-14 15:51:36] mtp block 197471, diff 6189.502 [2019-09-14 15:51:36] restart_threads [2019-09-14 15:51:36] DEBUG: job_id=8447d5d 63040000 xnonce2=0000000000000000 time=22:22:21 [2019-09-14 15:51:36] job 8447d5d 63040000 target change: 3f90831a9 (0.3) [2019-09-14 15:51:36] DEBUG: job_id=8447d5d 63040000 xnonce2=0000000000000000 time=22:22:21 [2019-09-14 15:51:36] job 8447d5d 63040000 target change: 3f90831a9 (0.3) [2019-09-14 15:51:37] DEBUG: job_id=8447d5d 63040000 xnonce2=0000000000000000 time=22:22:21 [2019-09-14 15:51:37] job 8447d5d 63040000 target change: 3f90831a9 (0.3) [2019-09-14 15:51:37] DEBUG: job_id=8447d5d 63040000 xnonce2=0000000000000000 time=22:22:21 [2019-09-14 15:51:37] job 8447d5d 63040000 target change: 3f90831a9 (0.3)

and ... [2019-09-14 14:20:33] GPU #1: EVGA GTX 1080 Ti, 3715.78 kH/s [2019-09-14 14:20:33] DEBUG: job_id=62f7d5d 4e040000 xnonce2=0000000000000000 time=14:45:49 [2019-09-14 14:20:34] GPU #0: EVGA GTX 1080 Ti, 3694.84 kH/s [2019-09-14 14:20:34] DEBUG: job_id=62f7d5d 4e040000 xnonce2=0000000000000000 time=14:45:49 [2019-09-14 14:20:34] DEBUG: job_id=62f7d5d 4e040000 xnonce2=0000000000000000 time=14:45:49 [2019-09-14 14:20:34] zcoin-us.mintpond.com:3000 asks job 1325662208 for block 197446 [2019-09-14 14:20:34] DEBUG: job_id=22f7d5d 4f040000 xnonce2=0000000000000000 time=11:01:49 [2019-09-14 14:20:34] Stratum difficulty set to 0.0877991 [2019-09-14 14:20:34] job 22f7d5d 4f040000 target change: b63b4300c (0.1) [2019-09-14 14:20:34] DEBUG: job_id=22f7d5d 4f040000 xnonce2=0000000000000000 time=11:01:49 [2019-09-14 14:20:34] job 22f7d5d 4f040000 target change: b63b4300c (0.1) [2019-09-14 14:20:35] DEBUG: job_id=22f7d5d 4f040000 xnonce2=0000000000000000 time=11:01:49 [2019-09-14 14:20:35] job 22f7d5d 4f040000 target change: b63b4300c (0.1) [2019-09-14 14:20:35] DEBUG: job_id=22f7d5d 4f040000 xnonce2=0000000000000000 time=11:01:49 [2019-09-14 14:20:35] job 22f7d5d 4f040000 target change: b63b4300c (0.1)

and ... [2019-09-14 13:42:48] GPU #1: EVGA GTX 1080 Ti, 3688.89 kH/s [2019-09-14 13:42:48] DEBUG: job_id=d267d5d 4a040000 xnonce2=0000000000000000 time=21:41:33 [2019-09-14 13:42:48] DEBUG: job_id=d267d5d 4a040000 xnonce2=0000000000000000 time=21:41:33 [2019-09-14 13:42:48] mtp block 197441, diff 6164.300 [2019-09-14 13:42:48] restart_threads [2019-09-14 13:42:48] DEBUG: job_id=8267d5d 4b040000 xnonce2=0000000000000000 time=11:01:33 [2019-09-14 13:42:48] job 8267d5d 4b040000 target change: bf3418a5c (0.1) [2019-09-14 13:42:49] DEBUG: job_id=8267d5d 4b040000 xnonce2=0000000000000000 time=11:01:33 [2019-09-14 13:42:49] job 8267d5d 4b040000 target change: bf3418a5c (0.1) [2019-09-14 13:42:49] DEBUG: job_id=8267d5d 4b040000 xnonce2=0000000000000000 time=11:01:33 [2019-09-14 13:42:49] job 8267d5d 4b040000 target change: bf3418a5c (0.1) [2019-09-14 13:42:49] DEBUG: job_id=8267d5d 4b040000 xnonce2=0000000000000000 time=11:01:33 [2019-09-14 13:42:49] job 8267d5d 4b040000 target change: bf3418a5c (0.1)

It seems as though a "job target change" causes the miner to just stop. I encounter the same issue with 100% power and no OC. I am using HWInfo for monitoring and see these maximums:

GPU Memory Usage: 41% Physical Memory Used: 5305 MB

I have several other rigs mining MTP without issue including ones with the EVGA GTX 1080 Ti SC2 and EVGA GTX 1080 Ti SC (non hybrid). I have also tried Ubuntu 18 and the T-Rex miner on this rig and it also crashes after a few minutes of mining (sometimes, it crashes the entire OS). I also tried using a different motherboard using a single EVGA GTX 1080 Ti SC2 HYBRID (11G-P4-6598-KR) and get the same result. I have had this rig for over 2 years and mined numerous other coins on various algos with these same power and clock settings and have never had any stability issues. It seems that the only common thread is the MTP algo and theEVGA GTX 1080 Ti SC2 HYBRID (11G-P4-6598-KR) GPU.

Any ideas?

djm34 commented 5 years ago

did you look at cpu usage and memory usage while running ?

nvOC-Stubo commented 5 years ago

According to task manager, CPU is 30-33% (on Xeon E5-2630L) and memory usage is ~26% (5500 MB max of 16 GB according to HWInfo).

I did a few more runs this morning on a freshly booted machine and have some more information. On the first execution of ccminer after a reboot, ccminer runs for ~20 minutes and fails. Then I see cuda memory errors:

... [2019-09-16 04:24:34] zcoin-us.mintpond.com:3000 asks job 1929379840 for block 197851 [2019-09-16 04:24:35] DEBUG: job_id=2467f5d 73000000 xnonce2=0000000000000000 time=08:53:49 [2019-09-16 04:24:35] DEBUG: job_id=2467f5d 73000000 xnonce2=0000000000000000 time=08:53:49 [2019-09-16 04:24:35] DEBUG: job_id=2467f5d 73000000 xnonce2=0000000000000000 time=08:53:49 [2019-09-16 04:24:35] DEBUG: job_id=2467f5d 73000000 xnonce2=0000000000000000 time=08:53:49 cudaErrorMemoryAllocation cudaErrorMemoryAllocation cudaErrorMemoryAllocation cudaErrorMemoryAllocation

Relaunching the miner again (no reboot), it runs for ~20 mins again, fails, and the logs just end like in the OP but I now see Windows Events that look like this:

Faulting application name: ccminer.exe, version: 1.8.4.0, time stamp: 0x5d759bc3 Faulting module name: VCRUNTIME140.dll, version: 14.13.26020.0, time stamp: 0x5a39fee3 Exception code: 0xc0000005 Fault offset: 0x000000000000ca30 Faulting process id: 0x15e0 Faulting application start time: 0x01d56c6ad027e7c5 Faulting application path: C:\Mining\ccminer\ccminer.exe Faulting module path: C:\WINDOWS\SYSTEM32\VCRUNTIME140.dll Report Id: 6a97d4c8-45e0-4553-84e6-7b947faa64d9 Faulting package full name: Faulting package-relative application ID:

So, it seems that I get cuda memory errors on a freshly booted system (no event in event viewer) but each subsequent execution results in an application fault according to event viewer but no cuda errors in the ccminer log.

djm34 commented 5 years ago

Have you check if one of the card could be faulty ? Actually it looks a bit like what would happen for cards which would be too much overclocked (it would run for a while then crash)

nvOC-Stubo commented 5 years ago

I found and followed the discussion here: https://bitcointalk.org/index.php?topic=5156883.0

where "cudaErrorMemoryAllocation" errors are sometimes resolved on rigs with large numbers of GPUs by running multiple instances of ccminer for fewer GPUs and decided to split my execution as well. Instead of launching one instance of ccminer for all 4 of my GPUs, I split this into 3 executions of 1, 1 and 2 GPUs. Doing this has greatly enhanced my stability as I have had no failures since I started them 2 hours ago. I am still using my original PL and OC settings.

I will report back my findings later, after several more hours.

FelixVVV commented 5 years ago

What version of nVidia drivers are you using?

nvOC-Stubo commented 5 years ago

@FelixVVV

I am using Driver Version: 436.30

nvOC-Stubo commented 5 years ago

Update:

I was able to get > 2 hours of mining with multiple instances of ccminer on 1,1, and 2 GPUs but the 2 GPU instance crashed with cudaErrorMemoryAllocation after about 2.5 hours. I then switched to one instance of ccminer per GPU and have been mining for over 12 hours without any errors. All of these test runs are using the same PL and clocks (per OP).

So, whatever/wherever the problem is, it seems to be greatly minimized, if not eliminated altogether, by allowing an instance of ccminer to only use 1 GPU. I am going to let this run for a bit longer and then I think I will switch back to Ubuntu 18 and try the same.

FelixVVV commented 5 years ago

Game ready driver 436.30 is unstable for all miners, look my post https://github.com/zcoinofficial/ccminer/issues/54#issue-493699834 try to use latest studio driver 431.86

nvOC-Stubo commented 5 years ago

For the record, I am at nearly 20 hours of mining with no errors/issues with the current run using the 436.30 driver on Win10 with 1 instance of ccminer per GPU. I am trying 431.86 as suggested and will report back soon.

nvOC-Stubo commented 4 years ago

I am happy to report that changing the nVidia driver to 431.86 from 436.30 has resolved this issue. I have been mining with no issues for the past 21 hours since making this change.