NebuTech / NBMiner

GPU Miner for ETH, RVN, BEAM, CFX, ZIL, AE, ERGO
https://nbminer.com
3.18k stars 519 forks source link

version41.0 for linux, the graphics card will drop after a period of time #828

Open fornote opened 2 years ago

fornote commented 2 years ago

32765142b14501927896deec8f9729a

image

For example, the last graphics card dropped, and the driver version is ok.

fegauthier commented 2 years ago

Same issue here with multiple 3070 Ti, 3080 and 3080 Ti

Jbtechnique commented 2 years ago

Any fixes yet?

dolikedistance commented 2 years ago

Same issue with 3080 Ti

lukezuca commented 2 years ago

Same issue with 3080 Ti Zotac (Micron) - Working around by setting the HiveOS watchdog to reboot rig in case hashrate drops

Jbtechnique commented 2 years ago

I can't get hashrate watch dog to work for me. Let me know if it restarts your rigs for you and what are the settings you use.

lukezuca commented 2 years ago

I can't get hashrate watch dog to work for me. Let me know if it restarts your rigs for you and what are the settings you use.

Hey, You need to set the watchdog to reboot it in case Min Power is lower than what you expect to have when all your cards are up. In my case if its lower than 500W it will mean my 3080Ti card is no longer active. Below the settings I have. Hope it helps

image

Jbtechnique commented 2 years ago

Thanks I appreciate the help. I hope they fix this in the next release. I at least can sleep better that I won't have too much downtime now I got the watchdog figured out.

budimulyawan commented 2 years ago

i have this issue as well... on one of my 3060 the other card is working fine

budimulyawan commented 2 years ago

i checked the log.. it crash then restart miner then hashrate never get back up to full 100%

[14:13:40] INFO - ethash - New job: eth.hiveon.com:4444, ID: 23545f96, DIFF: 4.295G [14:13:40] INFO - ethash - New job: eth.hiveon.com:4444, ID: c1fbeb6c, DIFF: 4.295G [14:13:41] INFO - ethash - New job: eth.hiveon.com:4444, ID: 4ccccbf7, DIFF: 4.295G [14:13:44] INFO - ethash - New job: eth.hiveon.com:4444, ID: ff003161, DIFF: 4.295G [14:13:44] INFO - ethash - New job: eth.hiveon.com:4444, ID: 5bc192c1, DIFF: 4.295G [14:13:46] INFO - ethash - #315 Share accepted, 152 ms. [DEVICE 2, #155] [14:13:46] ERROR - CUDA Error: unspecified launch failure (err_no=4) [14:13:46] ERROR - Device 2 exception, exit ... [14:13:47] ERROR - !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [14:13:47] ERROR - Mining program unexpected exit. [14:13:47] ERROR - Code: 6, Reason: Process crashed [14:13:47] ERROR - Restart miner after 10 secs ... [14:13:47] ERROR - !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [14:13:58] INFO - ---------------------------------------------- [14:13:58] INFO - | | [14:13:58] INFO - |       | [14:13:58] INFO - |         | [14:13:58] INFO - |         | [14:13:58] INFO - |           | [14:13:58] INFO - |       | [14:13:58] INFO - | | [14:13:58] INFO - | NBMiner - Crypto GPU Miner | [14:13:58] INFO - | 41.0 | [14:13:58] INFO - | | [14:13:58] INFO - ---------------------------------------------- [14:13:58] INFO - ------------------- System ------------------- [14:13:58] INFO - OS: Ubuntu 18.04.6 LTS, 5.10.0-hiveos [14:13:58] INFO - CPU: Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz [14:13:58] INFO - RAM: 5440 MB / 7648 MB [14:13:58] INFO - CU_DRV: 11.6, 510.68.02 [14:13:58] INFO - ------------------- Config -------------------

Sihai-Li commented 2 years ago

I have the same issue here. A random card (3060) will have low hashrate (from 50MH to 18MH) but maintains normal power usage (110w ish)

iKonTechDev commented 2 years ago

I had this going on i have been ok for about 4 hours so far on all the rigs try driver: 510.68.02 Use command: nvidia-update-driver https://us.download.nvidia.com/XFree86/Linux-x86_64/510.60.02/NVIDIA-Linux-x86_64-510.60.02.run

Seemed to work for me i tried clock and all that stuff nothing seemed to get me past an hour but this here! GL

budimulyawan commented 2 years ago

i am on this version confirm same issue

budimulyawan commented 2 years ago

I have the same issue here. A random card (3060) will have low hashrate (from 50MH to 18MH) but maintains normal power usage (110w ish)

exactly this on my rig.

fornote commented 2 years ago

Same issue with 3080 Ti Zotac (Micron) - Working around by setting the HiveOS watchdog to reboot rig in case hashrate drops

that's a good idea.

sabado commented 2 years ago

Same here, 3060 and 3080 Ti cards. I used the driver version 510.68.02 , then downgraded to the recomended version 510.60.02

Also i decreased memclock in 100 and 200 mhz, with no luck.

Sihai-Li commented 2 years ago

I have the same issue here. A random card (3060) will have low hashrate (from 50MH to 18MH) but maintains normal power usage (110w ish)

exactly this on my rig.

I am using some very conservative OC settings and so far the rig works fine for me in the past hour. My 3060 only achieved 47-48MH and 70ti only achieved 77MH. I will leave it overnight and hopefully everything will be fine.

budimulyawan commented 2 years ago

what is your setting for 3060?

On Mon, May 9, 2022 at 4:35 PM Sihai_Li @.***> wrote:

I have the same issue here. A random card (3060) will have low hashrate (from 50MH to 18MH) but maintains normal power usage (110w ish)

exactly this on my rig.

I am using some very conservative OC settings and so far the rig works fine for me in the past hour. My 3060 only achieved 47-48MH and 70ti only achieved 77MH. I will leave it overnight and hopefully everything will be fine.

— Reply to this email directly, view it on GitHub https://github.com/NebuTech/NBMiner/issues/828#issuecomment-1120698297, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7D5PF2T75UNVF7X6U35T3VJCW3HANCNFSM5VMV6PGQ . You are receiving this because you commented.Message ID: @.***>

smdbg commented 2 years ago

same issue evga 3060ti lhr rev2. - kernel 5.10.0-hiveos 83 , 510.60.02

miner log :

^[[0m[17:43:08] ERROR - ^[[49;31mCUDA Error: unspecified launch failure (err_no=4) ^[[0m[17:43:08] ERROR - ^[[49;31mDevice 5 exception, exit ... ^[[0m[17:43:09] ERROR - ^[[49;31m!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ^[[0m[17:43:09] ERROR - ^[[49;31mMining program unexpected exit. ^[[0m[17:43:09] ERROR - ^[[49;31mCode: 6, Reason: Process crashed ^[[0m[17:43:09] ERROR - ^[[49;31mRestart miner after 10 secs ... ^[[0m[17:43:09] ERROR - ^[[49;31m!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

dmesg:

[38075.067547] NVRM: GPU at PCI:0000:06:00: GPU-1558bfc5-dc52-aff2-68b1-1da115f8095e [38075.067553] NVRM: Xid (PCI:0000:06:00): 62, pid=1109, 0000(0000) 00000000 00000000 [38075.083805] NVRM: Xid (PCI:0000:06:00): 45, pid=26466, Ch 00000010 [38075.089396] NVRM: Xid (PCI:0000:06:00): 45, pid=26466, Ch 00000011 [38075.090294] NVRM: Xid (PCI:0000:06:00): 45, pid=26466, Ch 00000012 [38075.091138] NVRM: Xid (PCI:0000:06:00): 45, pid=26466, Ch 00000013 [38075.092013] NVRM: Xid (PCI:0000:06:00): 45, pid=26466, Ch 00000014 [38075.092856] NVRM: Xid (PCI:0000:06:00): 45, pid=26466, Ch 00000015 [38075.093686] NVRM: Xid (PCI:0000:06:00): 45, pid=26466, Ch 00000016 [38075.094509] NVRM: Xid (PCI:0000:06:00): 45, pid=26466, Ch 00000017

after miner restart - gpu hashrate is lost in space...

Sihai-Li commented 2 years ago

I have 3 of them in the same rig (2 evga 1 gigabyte). They are working at -300/-200 core +2400 mem PL 115w under Hiveos.

budimulyawan commented 2 years ago

i got 2.. of 3060s but only 1 that got issue... the gigabyte one got this issue which one is yours got issue?

On Mon, May 9, 2022 at 4:54 PM Sihai_Li @.***> wrote:

I have 3 of them in the same rig (2 evga 1 gigabyte). They are working at -300/-200 core +2400 mem under Hiveos

— Reply to this email directly, view it on GitHub https://github.com/NebuTech/NBMiner/issues/828#issuecomment-1120710185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7D5PCYO6CPYR4PG4A4FQTVJCZCHANCNFSM5VMV6PGQ . You are receiving this because you commented.Message ID: @.***>

smdbg commented 2 years ago

How much virtual memory is needed for miner to run - mine show (in red) 75,9Gb ?! on a 15Gb flash drive - is this OK ? or its trying to read/write to HDD (in many cases USB Flash drive) insane data size , also CPU usage is: x2 or x3 now

mohsenk94 commented 2 years ago

Same Issue here Watchdog also not working at all

fornote commented 2 years ago

Same Issue here Watchdog also not working at all

watchdog works well for me. You can refer to my settings. image

mohsenk94 commented 2 years ago

OMG man.. I looked at your image and made me Re-check my settings I was setting the "Set value for used miner" in H/S s.. what an embarrassment :(

Now it's working fine... the Watchdog I mean

amusleh-spotware-com commented 2 years ago

Same issue, after an hour of mining one of the GPUs randomly crash, not always the same GPU. All my cards are 3080 TIs, Trex works fine without any issue. It's silly to restart the rig every hour with watchdog, not safe at all!

visiontim commented 2 years ago

Me too with one or two GPUs after like an hour or so. Here's the log that points to the issues with nvidia drivers I guess:

ERROR: Error assigning value 1900 to attribute GPUMemoryTransferRateOffset (Base2:0[gpu:1]) as specified in assignment [gpu:1]/GPUMemoryTransferRateOffset[4]=1900 (Unknown Error). ERROR: Error assigning value 1900 to attribute GPUMemoryTransferRateOffsetAllPerformanceLevels (Base2:0[gpu:1]) as specified in assignment [gpu:1]/GPUMemoryTransferRateOffsetAllPerformanceLevels=1900 (Unknown Error). Attribute GPUPowerMizerMode (Base2:0[gpu:0]) assigned value 1. Unhandled integer attribute GPUMemoryTransferRateOffset (410) of GPU (1) (set to 1900) Unhandled integer attribute GPUMemoryTransferRateOffsetAllPerformanceLevels (425) of GPU (1) (set to 1900) Unhandled integer attribute GPUMemoryTransferRateOffset (410) of GPU (1) (set to 1900) Attribute GPUMemoryTransferRateOffset (Base2:0[gpu:1]) assigned value 1900. Attribute GPUPowerMizerMode (Base2:0[gpu:1]) assigned value 1. Attribute GPUPowerMizerMode (Base2:0[gpu:2]) assigned value 1. Attribute GPUPowerMizerMode (Base2:0[gpu:3]) assigned value 1.

fornote commented 2 years ago

t-rex has already released 100% unlock version.

visiontim commented 2 years ago

t-rex has already released 100% unlock version.

Any issues there?

amusleh-spotware-com commented 2 years ago

t-rex has already released 100% unlock version.

Any issues there?

Same.

budimulyawan commented 2 years ago

try this cd /tmp && wget https://cdn.discordapp.com/attachments/583125255841775637/973179117753204736/NBMiner_41.1_Linux.tgz && tar -xvf NBMiner_41.1_Linux.tgz && cd NBMiner_Linux && miner stop && cp nbminer /hive/miners/nbminer/41.0 && miner start

On Mon, May 9, 2022 at 11:55 PM Ahmad Noman Musleh @.***> wrote:

t-rex has already released 100% unlock version.

Any issues there?

Same.

— Reply to this email directly, view it on GitHub https://github.com/NebuTech/NBMiner/issues/828#issuecomment-1121134758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7D5PB6ELFYLQK3SITGUFTVJEKNZANCNFSM5VMV6PGQ . You are receiving this because you commented.Message ID: @.***>

fornote commented 2 years ago

t-rex has already released 100% unlock version.

Any issues there?

it's more unstable than nbminer v41.0...

fornote commented 2 years ago

try this cd /tmp && wget https://cdn.discordapp.com/attachments/583125255841775637/973179117753204736/NBMiner_41.1_Linux.tgz && tar -xvf NBMiner_41.1_Linux.tgz && cd NBMiner_Linux && miner stop && cp nbminer /hive/miners/nbminer/41.0 && miner start On Mon, May 9, 2022 at 11:55 PM Ahmad Noman Musleh @.> wrote: t-rex has already released 100% unlock version. Any issues there? Same. — Reply to this email directly, view it on GitHub <#828 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7D5PB6ELFYLQK3SITGUFTVJEKNZANCNFSM5VMV6PGQ . You are receiving this because you commented.Message ID: @.>

new version?

budimulyawan commented 2 years ago

yeah 41.1

On Tue, May 10, 2022 at 12:06 AM fornote @.***> wrote:

try this cd /tmp && wget https://cdn.discordapp.com/attachments/583125255841775637/973179117753204736/NBMiner_41.1_Linux.tgz && tar -xvf NBMiner_41.1_Linux.tgz && cd NBMinerLinux && miner stop && cp nbminer /hive/miners/nbminer/41.0 && miner start … <#m-5285547915110179475_> On Mon, May 9, 2022 at 11:55 PM Ahmad Noman Musleh @.> wrote: t-rex has already released 100% unlock version. Any issues there? Same. — Reply to this email directly, view it on GitHub <#828 (comment) https://github.com/NebuTech/NBMiner/issues/828#issuecomment-1121134758>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7D5PB6ELFYLQK3SITGUFTVJEKNZANCNFSM5VMV6PGQ https://github.com/notifications/unsubscribe-auth/AF7D5PB6ELFYLQK3SITGUFTVJEKNZANCNFSM5VMV6PGQ . You are receiving this because you commented.Message ID: @.>

new version?

— Reply to this email directly, view it on GitHub https://github.com/NebuTech/NBMiner/issues/828#issuecomment-1121148341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7D5PFVIGVNFWNI7ASB6YTVJELVVANCNFSM5VMV6PGQ . You are receiving this because you commented.Message ID: @.***>

fornote commented 2 years ago

yeah 41.1 On Tue, May 10, 2022 at 12:06 AM fornote @.> wrote: try this cd /tmp && wget https://cdn.discordapp.com/attachments/583125255841775637/973179117753204736/NBMiner_41.1_Linux.tgz && tar -xvf NBMiner_41.1_Linux.tgz && cd NBMinerLinux && miner stop && cp nbminer /hive/miners/nbminer/41.0 && miner start … <#m-5285547915110179475_> On Mon, May 9, 2022 at 11:55 PM Ahmad Noman Musleh @.> wrote: t-rex has already released 100% unlock version. Any issues there? Same. — Reply to this email directly, view it on GitHub <#828 (comment) <#828 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7D5PB6ELFYLQK3SITGUFTVJEKNZANCNFSM5VMV6PGQ https://github.com/notifications/unsubscribe-auth/AF7D5PB6ELFYLQK3SITGUFTVJEKNZANCNFSM5VMV6PGQ . You are receiving this because you commented.Message ID: @.> new version? — Reply to this email directly, view it on GitHub <#828 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7D5PFVIGVNFWNI7ASB6YTVJELVVANCNFSM5VMV6PGQ . You are receiving this because you commented.Message ID: @.>

this version still works weakly

mohsenk94 commented 2 years ago

So everyone, it's more more than 48 hours and my 41.3 Version is stable and smooth what I did was I had 1 non-LHR GPU in the rig, I ran that on a different miner (T-Rex) and the rest , put all the overclocks to theri previous max and it's hashing perfectly

budimulyawan commented 2 years ago

i did with that driver still same issue

On Mon, 9 May 2022, 3:32 pm iKonTechDev, @.***> wrote:

I had this going on i have been ok for about 4 hours so far on all the rigs try driver: 510.68.02 Use command: https://us.download.nvidia.com/XFree86/Linux-x86_64/510.68.02/NVIDIA-Linux-x86_64-510.60.02.run

Seemed to work for me i tried clock and all that stuff nothing seemed to get me past an hour but this here! GL

— Reply to this email directly, view it on GitHub https://github.com/NebuTech/NBMiner/issues/828#issuecomment-1120660494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF7D5PGXTFDUFB6XMCKWWV3VJCPQPANCNFSM5VMV6PGQ . You are receiving this because you commented.Message ID: @.***>