NebuTech / NBMiner

GPU Miner for ETH, RVN, BEAM, CFX, ZIL, AE, ERGO
https://nbminer.com
3.17k stars 518 forks source link

crash when printing Summary table before all GPUs started #765

Open Martin-Stangl opened 2 years ago

Martin-Stangl commented 2 years ago
[14:13:20] INFO - [14:13:20] ERROR - !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[14:13:20] ERROR - Mining program unexpected exit. 
[14:13:20] ERROR - Code: 11, Reason: Process crashed
[14:13:20] ERROR - Restart miner after 10 secs ... 
[14:13:20] ERROR - !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

The above error occours, when the Summary table should be printed before all GPUs finished starting. (In this specific case, before the message "INFO - Device 7 started, Free mem = 5833 MB." appears.)

Can be reproduces by setting a short log-cycle value, eg. --log-cycle 5.

System and config details see below:

Miner:   nbminer
Version: 40.1
[14:14:40] INFO - ------------------- System -------------------
[14:14:40] INFO - OS:     Ubuntu 18.04.6 LTS, 5.10.0-hiveos
[14:14:40] INFO - CPU:    Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
[14:14:40] INFO - RAM:    6906 MB / 7883 MB
[14:14:40] INFO - CU_DRV: 11.4, 470.86
[14:14:40] INFO - ------------------- Config -------------------
[14:14:40] INFO - ALGO:   ethash 

[14:15:08] INFO - ------------------- Device -------------------
[14:15:08] INFO -  |ID|PCI|  CC|Memory|CU|
[14:15:08] INFO - *| 0|  3|  86| 5940M|26| NVIDIA RTX A2000
[14:15:08] INFO - *| 1|  4|  86| 5940M|26| NVIDIA RTX A2000
[14:15:08] INFO - *| 2|  5|  86| 5940M|26| NVIDIA RTX A2000
[14:15:08] INFO - *| 3|  6|  86| 5940M|26| NVIDIA RTX A2000
[14:15:08] INFO - *| 4|  7|  86| 5940M|26| NVIDIA RTX A2000
[14:15:08] INFO - *| 5|  8|  86| 5940M|26| NVIDIA RTX A2000
[14:15:08] INFO - *| 6| 10|  86| 5940M|26| NVIDIA RTX A2000
[14:15:08] INFO - *| 7| 11|  86| 5940M|26| NVIDIA RTX A2000
[14:15:08] INFO - ----------------------------------------------
Martin-Stangl commented 2 years ago

Issue could be reproduced on a second rig.

Rig specs see below:

Miner:   nbminer
Version: 40.1

[14:22:56] INFO - ------------------- System -------------------
[14:22:56] INFO - OS:     Ubuntu 18.04.6 LTS, 5.10.0-hiveos                                                                                                                                                        
[14:22:56] INFO - CPU:    AMD Phenom(tm) II X4 965 Processor                                                                                                                                                       
[14:22:56] INFO - RAM:    11073 MB / 11968 MB                                                                                                                                                                      
[14:22:56] INFO - CU_DRV: 11.4, 470.86                                                                                                                                                                             
[14:22:56] INFO - ------------------- Config -------------------
[14:22:56] INFO - ALGO:   ethash                                                                                                                                                                                   

[14:22:56] INFO - ------------------- Device -------------------
[14:22:56] INFO -  |ID|PCI|  CC|Memory|CU|
[14:22:56] INFO - *| 0|  1|  86| 5940M|26| NVIDIA RTX A2000
[14:22:56] INFO - *| 1|  2|  86| 5940M|26| NVIDIA RTX A2000
[14:22:56] INFO - *| 2|  3|  86| 5940M|26| NVIDIA RTX A2000
[14:22:56] INFO - *| 3|  6|  86| 5940M|26| NVIDIA RTX A2000
[14:22:56] INFO - *| 4|  7|  86| 5940M|26| NVIDIA RTX A2000
[14:22:56] INFO - ----------------------------------------------
Martin-Stangl commented 2 years ago

Additional Information: The reason, why I want to set a short log-cycle is, that I am in the process of tuning the GPUs and therefore want to see changes in the hashrate quicker than only every 30 seconds.

manol781 commented 2 years ago

Hello, I have the same problem. When the miner starts and the last gpu starts, i got the error, it starts and got the error, and again and again.

Sometimes it boots and starts mining with no problems, but its randomly.

Any help?

AndrewPro commented 2 years ago

Hello, I have the same error code as you, 11. This is my first time mining and I don't know any solutions. I am using a 1060 form gigabyte.

code11

./nbminer -a kawpow -o stratum+tcp://kp.unmineable.com:3333 -u DOGE:sadfasdfasdf.asdfasdf -log
[14:54:43] INFO - |         NBMiner - Crypto GPU Miner         |
[14:54:43] INFO - |                    40.1                    |
[14:54:43] INFO - |                                              |
[14:54:43] INFO - ----------------------------------------------
[14:54:43] INFO - ------------------- System -------------------
[14:54:43] INFO - OS:     Pop!_OS 21.10, 5.15.23-76051523-generic
[14:54:43] INFO - CPU:    AMD Ryzen 5 2600X Six-Core Processor           
[14:54:43] INFO - RAM:    28917 MB / 32099 MB
[14:54:43] INFO - CU_DRV: 11.4, 470.86
[14:54:43] INFO - ------------------- Config -------------------
[14:54:43] INFO - ALGO:   kawpow
[14:54:43] INFO - URL:    stratum+tcp://kp.unmineable.com:3333
[14:54:43] INFO - USER:   DOGE:asdfasdfasdfasdf.asdfasdfasdfasdf
[14:54:43] INFO - TEMP:   limit 90C, start 85C
[14:54:43] INFO - ------------------- Device -------------------
[14:54:43] INFO -  |ID|PCI|  CC|Memory|CU|
[14:54:43] INFO - *| 0| 10|  61| 3016M| 9| NVIDIA GeForce GTX 1060 3GB
[14:54:43] INFO - ----------------------------------------------
[14:54:43] INFO - kawpow - Logging in to kp.unmineable.com(104.236.230.225):3333 ...
[14:54:43] INFO - Set extranonce: fb
[14:54:43] INFO - kawpow - Login succeeded.
[14:54:43] INFO - API:  0.0.0.0:22333
[14:54:43] INFO - API server started.
[14:54:43] INFO - Device 0 started, Free mem = 2320 MB.
[14:54:44] INFO - kawpow - New job: kp.unmineable.com:3333, ID: 20d67, HEIGHT: 2166243, DIFF: 4.295G
[14:54:44] INFO - mining.extranonce.subscribe succeeded.
[14:54:46] INFO - Light cache built, 1.66 s.
[14:54:46] INFO - Building DAG for EPOCH 288 on Device 0 ...
[14:54:46] ERROR - CUDA Error: out of memory (err_no=2)
[14:54:46] ERROR - Device 0 exception, exit ...
[14:54:46] ERROR - !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[14:54:46] ERROR - Mining program unexpected exit. 
[14:54:46] ERROR - Code: 11, Reason: Process crashed
[14:54:46] ERROR - Restart miner after 10 secs ... 
[14:54:46] ERROR - !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

`

Martin-Stangl commented 2 years ago

Hello, I have the same problem. When the miner starts and the last gpu starts, i got the error, it starts and got the error, and again and again.

Sometimes it boots and starts mining with no problems, but its randomly.

Any help?

If it is the same problem, it means the GPUs often take longer than 30 seconds to initialize.

As a workaround you can start the miner with the parameter --log-cycle 60 This would give the GPUs 60 seconds to initialize.

Martin-Stangl commented 2 years ago

CUDA Error: out of memory (err_no=2)

It is not the the issue. Actually your problem is stated a few lines above: Your graphics card does not have enough memory.

It looks like the Kawpow DAG just reached the size of 3GB and therefore cards with 3GB or less cannot be used to mine it anymore.

manol781 commented 2 years ago

Hello, I have the same problem. When the miner starts and the last gpu starts, i got the error, it starts and got the error, and again and again. Sometimes it boots and starts mining with no problems, but its randomly. Any help?

If it is the same problem, it means the GPUs often take longer than 30 seconds to initialize.

As a workaround you can start the miner with the parameter --log-cycle 60 This would give the GPUs 60 seconds to initialize.

Hello, thanks. Im using Hiveos, how can i put this parameter??

Martin-Stangl commented 2 years ago

log-cycle 60

You can set it in the flight sheet. See the following screenshot for a rough guide. image

manol781 commented 2 years ago

log-cycle 60

You can set it in the flight sheet. See the following screenshot for a rough guide. image

Thanks!! I think the problem is resolved!! Ill tell you later.

manol781 commented 2 years ago

log-cycle 60

You can set it in the flight sheet. See the following screenshot for a rough guide. image

The problem is solved!! Many thanks.

Now if i try with 16 cards, i got the error at the beginning (looping) and it doesnt boot.