rizwansarwar commented 7 years ago

Compiled from master, after a few minutes I get this. Mining on CUDA using GTX 1070's. Not sure what is this, the error is not very descriptive and I am not code wizz.

CUDA error in func 'search' at line 365 : unspecified launch failure. ✘ 15:26:10|cudaminer0 Error CUDA mining: unspecified launch failure CUDA error in func 'search' at line 365 : unspecified launch failure. ✘ 15:26:10|cudaminer4 Error CUDA mining: unspecified launch failure CUDA error in func 'search' at line 365 : unspecified launch failure. ✘ 15:26:10|cudaminer1 Error CUDA mining: unspecified launch failure CUDA error in func 'search' at line 365 : unspecified launch failure. ✘ 15:26:10|cudaminer3 Error CUDA mining: unspecified launch failure CUDA error in func 'search' at line 365 : unspecified launch failure. ✘ 15:26:10|cudaminer2 Error CUDA mining: unspecified launch failure

feracon commented 7 years ago

@jimmykl Ahh, maybe. Regardless, for anyone having this issue, Claymore may have the same but it can automatically recover, keeping you in business. Thanks for the heads up on VNC I'll check it out, I recon it's likely much lighter than TeamViewer.

@dhjw Thank you!

jimmykl commented 7 years ago

@feracon The new optimization flag is --cuda-parallel-hash and if it isn't set it uses 4 by default which the most optimal for most cards.

feracon commented 7 years ago

@jimmykl Ahh, I see valid settings are 1, 2, 4, and 8 but people reporting 8 sucks for some 1070s. Going to experiment and overclock again. Thanks for the tip.

emily-pesce commented 7 years ago

These were my results:

GTX 1070 Ubuntu 16.04 Memory offset +1500 Power ceiling 115w ./ethminer -U -M --cuda-parallel-hash X: 31.10 at 1 32.36 at 2 28.87 at 3 32.42 at 4 25.86 at 5 21.60 at 6 18.59 at 7 32.22 at 8

MatthiasThoemel commented 7 years ago

Hello to all here - I am a new member and happy to contribute some information to all.

I have tested the several 11.0 versions with my RIG1:

6 x GTX 1060 (ASUS Turbo) 6GB (OC Mem 10 GHz). Windows 10: up to date NVidia Driver: up to date

And I can confirm:

All versions of 11.0 have similar problems. Sometimes the error message is different but in general they all have the same issue. It looks like that the changes in the area of the CUDA search are buggy.

I am a software developer by myself and worked also with CUDA but unfortunately I do not have MS DevStudio 12 so I cannot contribute a bugfix. I tried to migrate the project to MS DevStudio 2017 but that failed for a lot of reasons.

I am now testing the old version: ethminer-0.9.41-genoil-1.1.7 if it has similar problems with overclocked cards and I will report.

report:

Also the version ethminer-0.9.41-genoil-1.1.7 is reporting an CUDA search error CUDA error in func 'ethash_cuda_miner::search' at line 346 : unspecified launch failure. X 01:25:13|cudaminer1 Error CUDA mining: unspecified launch failure

My suggestion is now: The program flow timings change by overclocking the GPU memory, I believe that the software has a general synchronisation issue in that area.

UPDATE 1:

I did not make any further tests especially not with lower over clocking because: it makes no sense.

Instead I wrote a couple of scripts which are monitoring the output of the ethminer and if they find the word "Error" they restart the rig in total. The restart takes 3 minutes and after that it is working at full overclocked speed again. The error takes place relativly rare (2 times a day at my rig).

I stay to my opinion: It is not related to the overclocking, it is related to the internal software design of ethminer for CUDA. Because: It is clearly at one defined position in the code. The different overclocking speed is only changing the software and synchronization behavior of the ethminer CUDA code and nothing else. I asume the designer has forget a sync object on a specific position in the code. And this code is running save at specific speed by accident.

Unfortunately I have no time to review the entire code - please take my opinion and my test with this very old software version as a hint for searching and fixing in the correct way. And please do not rely too much on the "overclocked = bad" opinion .

UPDATE 2:

I maybe figured out that: if you are using the ASUS GPU Tweak II Utility, you should close it after you appplied the tweak. Since I do this on startup of my rig by a script which runs 2 minutes after the GPU Tweak utility started, the ethminer software does not report any errors anymore. Maybe the tweak utility does a parallel access to the graphic cards from time to time and that is causing the error? Or: I observed that the tweak utility takes one complete CPU core after a while to do something I do not know. I have only two in my rig. Maybe the ethminer needs always big CPU headroom to function correctly. That may be also a software synchronization issue in the ethminer then.

Maybe you can check if your CPU is very busy from time to time and in these times the error occurs or maybe you check if your tweak utility is running while mining.

UPDATE 3:

I played around with some priority settings for the ethminer.exe and recognized: If I put it on a high priority the CUDA errors are comming very soon. So, this highlights my proposal that the ethminer.exe has asynchronization problem in general. Maybe somebody used boost messages and thinks of them to be thread safe. But they are not thread save. One has to take care about every shared memory or handle during programming with threads. I would start an analysis of the multi threadding design of the software and check if everything around shared memories is designed properly.

This is the end of my article for that topic :-)

My best regards, Matthias

kiwina commented 7 years ago

have the same issues here, several 1060 models after 10/20min the crash, funny part is 3 rigs 8 cards each cloned drives, 1 runs without issues other 2 crash

pabloi commented 7 years ago

I can confirm this happens in Ubuntu 17.10, cuda 8 with default drivers (375.66 I believe) running a 1060 and a 1050Ti, both OC +1600. Both cards crash at the same time, and ethminer stops but it is trivial to stop and start again, so I think a watchdog is the best solution (other than directly finding and fixing the issue). Previously claymore 9.5 seemed to be running fine for 24+hr, but it is possible it was failing and recovering silently.

Edit: I meant Ubuntu 17.04

saidmasoud commented 7 years ago

Concur with @pabloi, at least one GPU occasionally crashes while running Claymore 9.5 and 9.7 with fairly high overclocks, but the watchdog restarts the miner automatically and doesn't give any details as to why it crashed.

spyrek10 commented 7 years ago

http://cryptomining-blog.com/8852-new-optimized-ethminer-for-nvidia-geforce-gtx-1060-gpus/ This version is rockstable for me with same overclocking which crashes latest versions. Crash on latest versions: https://scr.hu/GWd9B6 Windows 7 x64 gtx 1070 + 1060

thghdbs commented 7 years ago

@MatthiasThoemel could you post your Windows 10 script for the automatic reboot on error?

pabloi commented 7 years ago

@dhjw Thanks for the script. I am using it to auto-restart ethminer if it fails. It hasn't failed so far though! (I am trying less power constraints to see if that affects failure time).

pabloi commented 7 years ago

@dhjw I am catching false-positives with your script. Not sure why, but sometimes after getting new work I will get a report of 0.00Mh/s without any error and, if left to itself, the miner could continue. However your script kills it and restarts it. Since I am getting this about once an hour or so, I modified the script to look for the "CUDA error" string instead of "0.00Mh/s", which will hopefully catch only true errors while still leading with this issue.

ℹ 18:33:06|stratum Received new job #0b7eeb3f ℹ 18:33:06|cudaminer0 set work; seed: #9e972470, target: #00000000dbe6 ℹ 18:33:06|cudaminer1 set work; seed: #9e972470, target: #00000000dbe6 m 18:33:06|ethminer Mining on PoWhash #0b7eeb3f : 0.00MH/s [A4+0:R0+0:F0] m 18:33:10|ethminer Mining on PoWhash #0b7eeb3f : 39.06MH/s [A4+0:R0+0:F0] m 18:33:14|ethminer Mining on PoWhash #0b7eeb3f : 39.32MH/s [A4+0:R0+0:F0] m 18:33:18|ethminer Mining on PoWhash #0b7eeb3f : 39.58MH/s [A4+0:R0+0:F0] Edit: added sample output from miner.

emily-pesce commented 7 years ago

Of my 4 mining rigs only one is crashing consistently. Below are the crash logs from that rig today, 8 total (so far). If order matters, 7 of the 8 crashes started with cudaminer3. This is interesting, because it tells me is that this is definitely overclock related. Whereas in the past I've seen a card crash and eventually require ethminer restart this error crashes all cards at once. But, root cause still seems to be one bad card, if this ordering is actually telling.

At the end of the day I would prioritize work to have ethminer restart itself (though my pulse script works great) instead of trying to figure out why overclocking is doing this.

I have reduced the overclock on gpu3 and will let you know what happens.

miner.201707110950:  ✘  09:49:39|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707110950:  ✘  09:49:39|cudaminer1  Error CUDA mining: an illegal memory access was encountered

miner.201707111324:  ✘  13:23:55|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111324:  ✘  13:23:55|cudaminer5  Error CUDA mining: an illegal memory access was encountered

miner.201707111336:  ✘  13:36:11|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111336:  ✘  13:36:11|cudaminer1  Error CUDA mining: an illegal memory access was encountered

miner.201707111448:  ✘  14:48:25|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111448:  ✘  14:48:25|cudaminer1  Error CUDA mining: an illegal memory access was encountered

miner.201707111704:  ✘  17:03:37|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111704:  ✘  17:03:37|cudaminer5  Error CUDA mining: an illegal memory access was encountered

miner.201707111814:  ✘  18:13:30|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111814:  ✘  18:13:30|cudaminer0  Error CUDA mining: an illegal memory access was encountered

miner.201707111818:  ✘  18:17:44|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer0  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111818:  ✘  18:17:44|cudaminer4  Error CUDA mining: an illegal memory access was encountered

miner.201707111919:  ✘  19:19:04|cudaminer3  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer4  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer5  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer2  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer1  Error CUDA mining: an illegal memory access was encountered
miner.201707111919:  ✘  19:19:04|cudaminer0  Error CUDA mining: an illegal memory access was encountered

ghost commented 7 years ago

As I mentioned at the other issue post (#94) first it was working for about 48 hours, then I had these errors 2 times, each after ~1.5 hours. Then just out of curiosity I lowered the OC on the mem clock from +700 to +650Mhz (core clock is +0, power target is 90%). These settings are applied to all cards. Turned the mining back on, and its working since. (9th of July) Maybe it means something, maybe it doesnt, because I saw comments about crashing on stock clocks. Maybe it will crash again today, but its interesting that it happened 2 times in 3 hours, and then runs more than 4 days without any issues.

pabloi commented 7 years ago

I second @aiden1408 . Yesterday I increased OC from 1600 to 1700 (mem) and managed to get 4 errors in 5 minutes. Previously it would crash a 2-3 times a day.

oleng commented 7 years ago

anyone knows what caused the problem yet? OC should not be a problem, this is miner software after all.

piotr-dobrogost commented 7 years ago

@saidmasoud

(...) watchdog restarts the miner automatically (...)

Could you please provide more information on how you implement watchdog in this case?

saidmasoud commented 7 years ago

@piotr-dobrogost I didn't implement it myself, it comes as part of the Claymore mining software and is enabled by default. I'm currently using Claymore until a fix for this issue is implemented

aityou commented 7 years ago

I have the same problem. I have 3 rigs using gtx 1060s pny/evga. I have 9 PNY gtx 1060 xlr8, 6 of them are running fine but three of them don't accept the same overclocking, an even when I lower their OC they crash either right away or after a while. when I run the GPUs with no OC they show"an illegal memory was encountered" so I have to go ith -400 core to run them, but still crashing!!!

derubm commented 7 years ago

Windows temporary fix: https://github.com/derubm/Ethminer_Watchdog

Malapha commented 7 years ago

EDIT: Meanwhile I strongly support Orkblutts solution https://github.com/orkblutt/MinerLamp. It needs less system resources, runs stable and looks great.

Powershell Solution for CUDA Crashes

So this is a powershell solution, running for some days now without issues. You can tune your cards without having to care about ethminer running into the discussed Bug. No Need to install extra software or 3rd Party Tools...

Feel free to improve. Due to testing the script I had some downtimes with my rig, so donations are very welcome :-) [0x76DC203d1cd70262459cEf56AdE865613c4b9693]

This is the output Screen: output

Instructions:

=> Generate a run.bat but use a powershell call to Tee out a log file - tee generates a log file that ist further processed by powershell Save the text into a run.bat in the same dir as ethminer. Excecute the ps1 file - and hopefully enjoy

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_SINGLE_ALLOC_PERCENT 100
powershell "./ethminer.exe --cuda-parallel-hash 4 --farm-recheck 150 -U -S eth-eu1.nanopool.org:9999 -FS eth-eu2.nanopool.org:9999 -O 0xYOURADRESS 2>&1|tee log.txt"
exit

=> This is the main Powershell script (don't forget to enable powershell Script execution in Windows). To reduce Memory issues, the script opens and Closes Jobs after a while (but mining goes on). Insert the Text into a *.ps1 file and save it in the ethminer Directory.

function JobOpen{
    $Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,13  

    gci log.txt | % { $sb = [scriptblock]::create("get-content -wait $_") ; start-job -Name LOGSEARCH -ScriptBlock $sb }
  $null = $(get-job|receive-job)
   # sleep 1

}

function JobClose{
    Stop-Job -Name LOGSEARCH
    get-job | Remove-Job
    [System.GC]::Collect()
    #sleep 1
}

function EthRestart{

    #cls
    #Write-Host "#######################################################################################################"
    #$Host.UI.WriteLine($(get-job | receive-job))
     stop-process -Name ethminer
     sleep 2
     RemoveLog
     sleep 2
     Start-Process .\run.bat
     sleep 2

}

function RemoveLog {
    $strFileName=".\log.txt"
    If (Test-Path $strFileName){
       Remove-Item $strFileName -Force
    }Else{
            # // File does not exist
     }
}

function statOutput{

    $Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,0
            Write-Host "Start:    $orgstartdate"
    $Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 50,0
            Write-Host "Nowdate:   $nowdate"
    $Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,1
            write-host "Restart:  $ethstartdate"
    $Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 50,1
            write-host "#Restarts: $i"
    $Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,2
            write-host "Jobstart: $jobstartdate"

}

$i=0
$s=0
$orgstartdate= get-date
$ethstartdate=get-date
$jobstartdate= Get-Date
$nowdate = Get-Date
#$d=Get-Date
RemoveLog
sleep 2
Start-Process .\run.bat 
sleep 7

$Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,10   
[System.GC]::Collect()
gci log.txt | % { $sb = [scriptblock]::create("get-content -wait $_") ; start-job -Name LOGSEARCH -ScriptBlock $sb }

sleep 1

while(1) { 
  statOutput
  if(($nowdate - $ethstartdate).totalseconds -ge 15) {
    $Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,20
    $Host.UI.WriteLine($(get-job | receive-job -Keep |select -last 1))
    $m = $(get-job | receive-job| select -last 50 |Select-String "Error CUDA mining" ) 

    if($m -ne $null) { 
        $i++
        JobClose
        ethrestart
        $ethstartdate= Get-Date
                JobOpen
        $jobstartdate=$nowdate        

           }
  }
  $null=$(get-job | receive-job)

  sleep -m 50
  $nowdate= Get-Date
 $s++

 if($s -ge 6000) {
   $Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,5
  $nowdate= Get-Date
   Write-Host "GARBAGE COLLECT START $nowdate "
  [System.GC]::Collect()
  sleep 1
  $nowdate= Get-Date
  $Host.UI.RawUI.CursorPosition = New-Object System.Management.Automation.Host.Coordinates 0,5
   Write-Host "GARBAGE COLLECT Ended $nowdate " 
   $s=0
 }

 if(($nowdate - $jobstartdate).totalseconds -ge 60000) {
    JobClose
    JobOpen
    $jobstartdate=$nowdate
 }

 if(($nowdate - $ethstartdate).totalseconds -ge 7200) {
    $i++
        JobClose
        ethrestart
        $ethstartdate= Get-Date
                JobOpen
        $jobstartdate=$nowdate        

 }
}
exit

sblmasta commented 7 years ago

I have the same problem with overclocked GTX 1070. I set +100 GPU and +1300 Memory. After this claymore miner and ethminer report failures. When I set 900-1000 for memory crashes are every 10-15 minutes, so is acceptable but I don't want this ;/

I have Ubuntu 17.04, NVIDIA Driver: 375.66 and CUDA from apt repo. Currently I have +100 GPU and +1000 Memory and I have 185 MH/s in 6 cards.

I have 6 x Asus ROG STRIX GTX 1070 O8G-GAMING

orkblutt commented 7 years ago

I've also made an easy to use GUI (in Qt) that handle errors. https://github.com/orkblutt/MinerLamp

orkblutt commented 7 years ago

miner_lamp

VickoValch commented 7 years ago

My rig is not running on Ubuntu, but I had the same issue. The problem first occurred after I added a 5th card(Evga GTX 1060 6GB) to an already running 4 x Evga GTX 1060 6GB machine. After some test I noticed that the fifth gpu had a micron ddr5 memory and was using a differen vbios version compared to the other 4 gpus. Today I flashed the Bios of the GPU and upgraded it to the same version as the others. The gpu still can't handle the OC-ing that I'm using on the ones with Samsung gddr, but it is stable at 50 % of their overclock values. For example gtx 1060 with Samsung running at +625 memory, with Micron running at +300 , both at 80% power. So far 3 hours no problem. Before the BIOS flash, it was crashing the whole miner even at stock or slight overclock. I will keep tracking and updating here. Hope my findings help you further.

freiro commented 7 years ago

Latest -dev version (ethminer-0.12.0.dev1) seems to help!

Update: Unfortunately it still happens.

dhjw commented 7 years ago

It seems to me to be clearly related to overclocking too much. Reduce overclocking on the GPU that crashes first and you can keep the rest higher. I have one card out of 6 that is more sensitive and heats up way faster than all the others, even from the same manufacturer. I wonder if a script could be devised to automatically find the setting that doesn't crash on each GPU.

Edit: I gave up tweaking the "card that crashes first" as I think it's inaccurate. I do kill ethminer and restart it when that happens, but now I only decrease overclock when a card goes offline.

spyrek10 commented 7 years ago

@dhjw so why older version works for me without any crash with exactly the same overclocking?

dhjw commented 7 years ago

Not sure @spyrek10 but I upgraded and it seems to be the same to me.

dafyk commented 7 years ago

In my case the problem was (or at least i hope so, 48hrs without problem) caused by SATA->MOLEX reduction in powered USB riser, it was getting very HOT (about 70C) and f.e. EWBF miner exited few seconds after start. Replacing SATA->MOLEX reduction and powering direclty from MOLEX on PSU did solve my problems (on windows and linux).

sblmasta commented 7 years ago

@dafyk I had the same problem with high temperature on cable in MOLEX-SATA POWER. I replaced wire to sata power only and works good.

Caerus7 commented 7 years ago

@orkblutt - I really like your solution, but for some reason minerlamp crashes on my system. The program itself works, but soon after I start mining windows says the program has stopped working. On further attempts ethminer wont start at all. In or out of your program. I have to reboot.

If the devs are reading this, I hope a watchdog feature is high on the priority list. I've so far refused to use Claymore as I don't like what he stands for. Not so much the fee, but to me there's no doubt he's ripped genoils CUDA optimizations. That's wrong. Then there's the impact his dev fee server switching has on pool servers, but that's a debate for somewhere else.

Ethminer is the best ETH miner, and asks for nothing more than a donation (which I gladly give). For ETH mining only, Claymore has no benefit other than a watchdog. I hope ethminer gets one so I never even have to consider paying him a cent. Thanks for all your hard work.

Just a final note. I've definitely been able to increase the memory overclock by a decent amount (+100, 4x1060's) with far fewer crashes using the latest ethminer 0.12 dev releases. I'm fairly comfortable leaving miner restarts an hour apart. Crashing occurs just once or twice a day. There's no way I could do that before with these clocks. Maybe these new cards are just being nicer to me now (highly unlikely), or the devs have been looking into the issue. I hope so. A watchdog would still give that needed peace of mind.

derubm commented 7 years ago

if minerlamp won´t work : malapha ´s powershell solution , some post above , or https://github.com/derubm/Ethminer_Watchdog

might help untill there ´s a built-in watchdog.

Caerus7 commented 7 years ago

Thanks @derubm. Minerlamp seems to be working fine now. I haven't gone back to 0.11 to see if there was some issue with that version on my system. Just taking the win :). Nice work @orkblutt. Thank you.

joantune commented 6 years ago

Hi Guys!

Just started mining, after @michael-pesce 's answer I did a 'simple' watchdog with a bash script and I run everything with supervisord (I use that because I remembered that Docker containers were using it in the early days)

It's available here:

https://github.com/joantune/ethminerWatchdog

It's a Linux watchdog, for Nvidia's but it might be adapted to other cards

I'm running it on a screen so far so good, do read the Readme on it

Jacxz commented 6 years ago

Hi, I'm also in the same boat as everyone else.

I guess I'll be trying out MinerLamp (on Windows).

For Linux I'll probably try ethminerWatchdog by joantune, the solution seems neat if supervisord is good. But I wanted to ask, has anyone tried this python based monitor https://github.com/philon123/MinerMon ?

Also to add to the issue discussion, can it be an issue related to the CUDA release used somehow? Wondering after finding this issue #53 which the reporter closed on his own.

emily-pesce commented 6 years ago

On Windows I paid the relatively small amount for Awesome Miner and have been pleased.

On Ubuntu I still use my own scripting and it hasn't let me down. Happy to share more details if folks are interested.

derubm commented 6 years ago

as this one is not closed yet: as mentioned by many miners allready: Illegal memory access error is in case of Nvidia cards happen due to having a card running on max overclocked memory on Power state 2. When your miner does switch to P0 state for whatever reason, memory gets an additional 200 mhz and can (or will) get unstable, which causes this error .

Solution W/o Watchdog: set your mining rig to P0 state ( Windows Nvidiainspector old version, section 5, set force P2 state to "off") on linux you should be able to do that allready with nvidia-smi.)

.Explanation: When you run your miner on P0 state, the mem-clock overkill will not appear any longer on maxed out (depends on memory brand) GDDR5 (example samsung memory : +710 on P0 state, +910 on P2 state, memory speed in both cases : 4714mhz on windows (x2 on linux for display) ) so on p2 state you would be running +910, then p0 state snaps in and you have not +910 but +1110 - which causes the crash. If you run your card from start with P0 state it can not run higher then supposed ( +710 in my case for example on P0 state), so no more crash will appear.

Sample Nvidia inspector with version number and section that needs to be changed:

Note: after Driver Update you need to set P0 State again! also: Note that you have to set 200 mhz less overclock, as P0 state does add those 200 allready! maybe things like that can be included in correct english in the readme.

Jacxz commented 6 years ago

@derubm thanks for the clear input! I've not had any issues with illegal memory access since switching to P0 state with NVIDIA Profile Inspector 2.1.3.10 (Force P2 State -> Off). That is on Windows 10 with a single GTX 1070. For some reason it has been stable with P2 for a long time on my other machine Windows 10 with four GTX1060. But I guess I'll switch to P0 there as well just to be safe.

With Linux I've not been able to switch to P0, right now the cards go to P0 state when they are on idle, but as I start ethminer they go to P2. @michael-pesce I'm very interested in any suggestions on good solutions, feel free to share your scripting knowledge :)

dhjw commented 6 years ago

On Linux it stays at P2 but you can still overclock the cards as high as they'll go. It depends on each card but I get between 22.52 and 25.10 on GTX 1060s. Basically I set the card a little high then observe how much hashrate comes out of it to determine the memory type (~22-23 micron, ~25 samsung) then decrease to where it's stable and doesn't get knocked offline.

[rig1] ethminer Speed 144.06 Mh/s gpu/0 23.00 gpu/1 24.94 gpu/2 22.52 gpu/3 24.86 gpu/4 25.10 gpu/5 23.65 [rig2] ethminer Speed 163.83 Mh/s gpu/0 23.40 gpu/1 22.76 gpu/2 23.48 gpu/3 23.40 gpu/4 25.02 gpu/5 22.92 gpu/6 22.84

I do my config by device UUID so things don't get mixed up. Here's my mine-setup script and settings file. Send me ETH at 0x5f8f7166c9920ea2d786e0810defdc611544fbfe :)

moodonis commented 6 years ago

anyone know how to get P0 State working in Linux on GTX 1070s? most/all info out there doesn't work, so any link would be greatly appreciated.

dhjw commented 6 years ago

In my experience, it's normal to stay at P2 in Linux. It doesn't affect how much you can overclock or the speed you get.

Angel996 commented 6 years ago

I have it too in Ubuntu 16.04. The problem with this error (illegal memory access encountered) is that it enters an infinite loop and needs to be terminated manually. Once restarted, miner runs fine for another, say, 30 minutes.

Why not make a counter for this message occurrence and, say, after 50 consecutive messages just restart the miner or exit so that we could restart it with a shell script?

dhjw commented 6 years ago

To restart ethminer automatically, start it like this:

while [ 1 ]; do ethminer --farm-recheck 200 -U -F http://127.0.0.1:8080/hostname2>&1 | mine-monitor; done

Here's my mine-monitor script. It requires PHP and a working email system like postfix configured with gmail.

If you're still getting these errors it means one of your cards is overclocked too high. When the card eventually gets knocked offline reduce the overclock a little and reboot. Eventually you should not get any more errors.

neskoc commented 6 years ago

This is my experience with 7 x ASUS GeForce DUAL-GTX1060--O6G(edit: currently 9) on Win10 rig on ASRock H110 Pro BTC+ with ethminer 0.12 (and Claymore as a short test) First I tested with only 2 cards but it is consistent with 7 (soon I'll add at least 2 more possibly up to 5). I have tested with a single ethminer process for all GPU's and a separate for each GPU as well as a combinations like 1+6 etc. The best result was while running separate processes for each GPU - in a case of failure only one card drops out. When some GPU starts failing usually it fails within a minute again so no point restarting the process again (I didn't check whether reboot would give better result ... I have just started testing so I didn't come so far - I need to tweak few things with delayed start of Asus GPU Tweak II and then shutting it down, read more about the reason below).

There is always one card (usually the same) where ethminer is failing with the error message "CUDA error in func 'ethash_cuda_miner::search' at line 346 : unspecified launch failure" With 2 GPU's it was on card0 with 7 it is now (usually) on card1.

Monitor is now connected to the built in Intel GPU so theoretically RDP should not influence the outcome though I have to investigate this some more (I used RDP before when I was checking/testing things so I'm not really sure whether it was influencing the outcome). I believe I'm pretty conservative with OC and I lowered memory speed by 200 (to 9.300) compared with the recommended for the optimal hash speed/power consumption (65% - 65 degrees) just to be on the safe side (reported 22,9MH/s/card). The cards are the "OC" model so I can't lower GPU speed below the "min" value given in the GPU Tweak interface (1.607). FYI, I'm using Asus GPU Tweak II - pain in the ass due to some glitches like resetting my settings every time something goes wrong with GPU and GPU Tweak is running in background which means I'll run it once in the beginning and close it afterwards to prevent this happening (edit: Adding new card resets the values to default ones so it is necessary to setup values every time GPU configuration is changed + when something breaks like OS getting frozen).

If one card fails all others are stable nevertheless (at least for 8 hours, my longest test so far). Trying to use Claymore on the failing card forces the error to migrate to the card2 and hash rate for Claymore is around 19MH/s. In other words the alternative 6 ethminers + 1 Claymore wouldn't work either.

I'll post some updates after I've tested few more things like what will happen without using RDP nor Teamviewer that I used on the other system to reboot where I have 1 x AMD Vega 64 + 1 x Asus GTX 1060 6G (not OC) and where 1060 usually drops out once every 24-48 hours so I used Teamviewer to access the computer from abroad. I'm not sure whether Teamviewer itself could be source of any problem (I'm running it on my 7xGPU rig too).

After last reboot I didn't use RDP and so far it was running without any problem for 45 minutes which is promising.

I have also been running one "rig" with 2 x ASUS GeForce DUAL-GTX1060--O6G on MacPro (2011) with Ubuntu 16.04 + ethminer rock stable (is it correct English? ;) ) for weeks though without been able to tweak memory/GPU speed (only power target) so it has average 35.4 MH/s. My plan is to eventually move those 2 GPU's to the ASRock rig. If I figure out how to tweak memory/GPU speed on linux my plan/hope is to kick out Windows so any advise is welcome. I've googled a few that I couldn't get working (honestly I didn't put so much time on it so far - had some other things to do).

Edit 1. 2 hours later: no RDP => no error (seems like).

I've just plugged in 8th GPU and I'll come back with the update. Unfortunately I have no more PCIe power connectors available and it seems like the secondary PSU is trying to be smart and won't supply any current for GPU/SATA without the threshold load on the ATA power connector ... or my brand new PSU is not working (not probable but I'm not really sure yet).

So far there is strong indication that the quoted error is (directly) related to RDP messing up with/for ethminer.

Edit 2. After plugging in 8th GPU the system became unstable again (without connecting RDP) so few tweaks later (memory speed down to 9.100) + couple of reboots it became stable again (for one hour). Then I found the work-around for connecting 9th GPU scrambling from all the cables I had around: type4 to 4 x AMP MATE-N-LOK + Molex to PCIe power / Molex to sata power. At a same time I ordered 20 PCIe power splitter cables from AliExpress $1.29/piece so in 3-4 weeks I'll be able to build another 12-13 x GPU rig with one PSU for each (1.200W).

Anyhow back to the rig: First Windows got stuck because I started first miner too soon (before GPU Tweak has been able to shut down completely - impatience I know :) ). Reset button and after logging in and starting miners the 9th started complaining about "out of memory" error. Few tweaks later with paging values ended at 35.000/45.000 MB - min/max and I could start even 9th miner. 20 minutes later still no error with reported hashrate 22,6MH/s in average. If it stays like this I would be more then satisfied :)

Edit 3. 50 minutes later - still no error Question: Any suggestion about choosing between GTX1060 "normal" or "OC". I ordered 10 OC's because "normal" were out of stock with several weeks estimated delivery time. The price was few bucks cheaper too though I would've stick with "normal" if they were available at a time. Now I'm not sure any more what is to prefer for ETH mining.

Eit 4. 15,5 hours later no error and still counting. Current reported hash rate: 22,4-22,5, in average 20,3 - 24,2 MH/s (average of 9: 22,4MH/s) I even lowered memory speed for the non OC card and its ethminer process hasn't crashed yet neither (small change in hassrate as a result though it maybe had been higher in average since, currently: 23,8MH/s) So case is closed for my part.

Edit 5. ethminer on card9 produced an error after 23:27 first time and second time after roughly 22 hours. After second time I decided to use RDP to restart the miner and see whether it would cause an earlier error (to compare with the situation without running RDP). I'll come back to you with an update. Update edit 5. Same card dropped out next time after 60:56h(RDP used 2-3 times).

Edit 6. 7 days later and still running ...

H05ted commented 6 years ago

Take and my 5 cents. On GPU I start mining a week ago. For now i have 14 1070ti -+ OC, 2 farms and mining eth with auto restart ethminer if it stops on errors. This two scripts is not best solution, writen from scratch but works fine. Writed only for nvidia but i think it maybe rewriten for ati too )) All this tested on Ubuntu 16.04

                                          Ok, Lets go!

!!! nvidia coolbits must be enabled if you want OC settings to work. Mine is 13 tested on 381 and 387 drivers, emulated monitor for each card neded my nvidia-xconfig conf for 7 GPU, edid.bin find in google, i made mine from AOC 23 mon

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 387.34 (buildmeister@swio-display-x64-rhel04-15) Tue Nov 21 03:31:45 PST 2017

Section "ServerLayout" Identifier "Layout0" Screen 0 "Screen0" Screen 1 "Screen1" RightOf "Screen0" Screen 2 "Screen2" RightOf "Screen1" Screen 3 "Screen3" RightOf "Screen2" Screen 4 "Screen4" RightOf "Screen3" Screen 5 "Screen5" RightOf "Screen4" Screen 6 "Screen6" RightOf "Screen5" InputDevice "Keyboard0" "CoreKeyboard" InputDevice "Mouse0" "CorePointer" EndSection

Section "Files" EndSection

Section "InputDevice"

generated from default

Identifier     "Mouse0"
Driver         "mouse"
Option         "Protocol" "auto"
Option         "Device" "/dev/psaux"
Option         "Emulate3Buttons" "no"
Option         "ZAxisMapping" "4 5"

EndSection

Section "InputDevice"

generated from default

Identifier     "Keyboard0"
Driver         "kbd"

EndSection

Section "Monitor" Identifier "Monitor0" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor1" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor2" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor3" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor4" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor5" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor6" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070 Ti" BusID "PCI:1:0:0" EndSection

Section "Device" Identifier "Device1" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070" BusID "PCI:2:0:0" Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device2" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070" BusID "PCI:3:0:0" Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device3" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070 Ti" BusID "PCI:5:0:0" Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device4" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070 Ti" BusID "PCI:6:0:0" Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device5" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070 Ti" BusID "PCI:7:0:0" Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device6" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070" BusID "PCI:8:0:0" Option "ConnectedMonitor" "DFP-0"
Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen1" Device "Device1" Monitor "Monitor1" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen2" Device "Device2" Monitor "Monitor2" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen3" Device "Device3" Monitor "Monitor3" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen4" Device "Device4" Monitor "Monitor4" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen5" Device "Device5" Monitor "Monitor5" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen6" Device "Device6" Monitor "Monitor6" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

                     Script is for miner loop with OC settings for each GPU. 
                            Settings apply only ones at start if they enabled
                 Just edit it for your needs and run thats all, main part after it

!/bin/sh

nvidia-settings -a GPUFanControlState=0

nvidia-settings -a GPUGraphicsClockOffset[3]=-100

nvidia-settings -a GPUMemoryTransferRateOffset[3]=1200

nvidia-smi -pm 1

nvidia-smi -pl 155

nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1200

nvidia-settings -a [gpu:0]/GPUFanControlState=1

nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:1]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:1]/GPUMemoryTransferRateOffset[3]=1450

nvidia-settings -a [gpu:1]/GPUFanControlState=1

nvidia-settings -a [fan:1]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:2]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:2]/GPUMemoryTransferRateOffset[3]=1150

nvidia-settings -a [gpu:2]/GPUFanControlState=1

nvidia-settings -a [fan:2]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:3]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:3]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:3]/GPUFanControlState=1

nvidia-settings -a [fan:3]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:4]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:4]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:4]/GPUFanControlState=1

nvidia-settings -a [fan:4]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:5]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:5]/GPUMemoryTransferRateOffset[3]=800

nvidia-settings -a [gpu:5]/GPUFanControlState=1

nvidia-settings -a [fan:5]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:6]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:6]/GPUMemoryTransferRateOffset[3]=900

nvidia-settings -a [gpu:6]/GPUFanControlState=1

nvidia-settings -a [fan:6]/GPUTargetFanSpeed=80

while true; # This will loop your miner even if you kill -9 ethminer it will start again after do \

To stop just CTRL+C or what ever you want =)

do /home/m1/Miner/ethminer -U -S eth-eu2.nanopool.org:9999 -O 0xb4983146f0047d87c63b5fdb3ef9e2bee4557ea3.M1/vhosted@gmail.com done

                       Thats was not so hard, the main deal is up to go !!! 
                      While our miner script is working  we will run another one 
                                             Script for monitoring

!/bin/sh

-i 5 number GPU to monit

gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu --format=csv,noheader,nounits

while true; #Loops :=)) do while [ $gpu -gt 50 ] do gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu --format=csv,noheader,nounits
echo "GPU load $gpu" echo "All good $(date) GPU load $gpu No errors" sleep 10 done if [ $gpu -lt 40 ] then killall -9 ethminer echo "Restart Miner GPU load $gpu $(date) error" echo "Restart Miner $(date) error" >> /home/m1/Miner/ethminer.log sleep 60 gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu --format=csv,noheader,nounits fi; done

Thats it. Finished it esterday. I think it can be smaller. But nothing need to install, compile etc. All night i tested my GPUs with OC and power -+ very fast to test cloks and + tail -f /var/log/kern.log | grep nvrm to see what gpu couesd an error without long farm stop. If it will help you. I like good coffe )) b4983146f0047d87c63b5fdb3ef9e2bee4557ea3 Hosted

joantune commented 6 years ago

Hi hosted! I have a small rig, but it has a similar xconfig (that trick to emulate a monitor took a while to learn), but maybe the coolbits value is different, and the oc values as well. I have to try some of your oc values, in gtx 1060 i remember I couldn't control the fan for instance. I either had the wrong coolbits value or it just doesn't work with 1060s, but there's nothing like trying.

Again, yours is a very complete solution, so thanks for posting it here. Like I said I have a tiny rig with one 1060, from which I squeeze at most 23.6 MH/s. I was wondering how many MH/s do you get from one board with those oc configs?

On Thu, Dec 21, 2017, 08:29 H05ted notifications@github.com wrote:

Take and my 5 cents. On GPU I start mining a week ago. For now i have 14 1070ti -+ OC, 2 farms and mining eth with auto restart ethminer if it stops on errors. This two scripts is not best solution, writen from scratch but works fine. Writed only for nvidia but i think it maybe rewriten for ati too ))
                                      Ok, Lets go!
!!! nvidia coolbits must be enabled if you want OC settings to work. Mine is 13 tested on 381 and 387 drivers, emulated monitor for each card neded my nvidia-xconfig conf for 7 GPU, edid.bin find in google, i made mine from AOC 23 mon

nvidia-xconfig: X configuration file generated by nvidia-xconfig nvidia-xconfig: version 387.34 (buildmeister@swio-display-x64-rhel04-15) Tue Nov 21 03:31:45 PST 2017

Section "ServerLayout" Identifier "Layout0" Screen 0 "Screen0" Screen 1 "Screen1" RightOf "Screen0" Screen 2 "Screen2" RightOf "Screen1" Screen 3 "Screen3" RightOf "Screen2" Screen 4 "Screen4" RightOf "Screen3" Screen 5 "Screen5" RightOf "Screen4" Screen 6 "Screen6" RightOf "Screen5" InputDevice "Keyboard0" "CoreKeyboard" InputDevice "Mouse0" "CorePointer" EndSection

Section "Files" EndSection

Section "InputDevice"

generated from default

Identifier "Mouse0" Driver "mouse" Option "Protocol" "auto" Option "Device" "/dev/psaux" Option "Emulate3Buttons" "no" Option "ZAxisMapping" "4 5" EndSection

Section "InputDevice"

generated from default

Identifier "Keyboard0" Driver "kbd" EndSection

Section "Monitor" Identifier "Monitor0" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor1" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor2" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor3" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor4" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor5" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Monitor" Identifier "Monitor6" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection

Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070 Ti" BusID "PCI:1:0:0" EndSection

Section "Device" Identifier "Device1" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070" BusID "PCI:2:0:0" Option "ConnectedMonitor" "DFP-0" Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device2" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070" BusID "PCI:3:0:0" Option "ConnectedMonitor" "DFP-0" Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device3" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070 Ti" BusID "PCI:5:0:0" Option "ConnectedMonitor" "DFP-0" Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device4" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070 Ti" BusID "PCI:6:0:0" Option "ConnectedMonitor" "DFP-0" Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device5" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070 Ti" BusID "PCI:7:0:0" Option "ConnectedMonitor" "DFP-0" Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Device" Identifier "Device6" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce GTX 1070" BusID "PCI:8:0:0" Option "ConnectedMonitor" "DFP-0" Option "CustomEDID" "DFP-0:/etc/X11/edid.bin" EndSection

Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen1" Device "Device1" Monitor "Monitor1" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen2" Device "Device2" Monitor "Monitor2" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen3" Device "Device3" Monitor "Monitor3" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen4" Device "Device4" Monitor "Monitor4" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen5" Device "Device5" Monitor "Monitor5" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

Section "Screen" Identifier "Screen6" Device "Device6" Monitor "Monitor6" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection
                 Script is for miner loop with OC settings for each GPU.
                        Settings apply only ones at start if they enabled
             Just edit it for your needs and run thats all, main part after it
!/bin/sh

nvidia-settings -a GPUFanControlState=0

nvidia-settings -a GPUGraphicsClockOffset[3]=-100

nvidia-settings -a GPUMemoryTransferRateOffset[3]=1200

nvidia-smi -pm 1

nvidia-smi -pl 155

nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1200

nvidia-settings -a [gpu:0]/GPUFanControlState=1

nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:1]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:1]/GPUMemoryTransferRateOffset[3]=1450

nvidia-settings -a [gpu:1]/GPUFanControlState=1

nvidia-settings -a [fan:1]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:2]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:2]/GPUMemoryTransferRateOffset[3]=1150

nvidia-settings -a [gpu:2]/GPUFanControlState=1

nvidia-settings -a [fan:2]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:3]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:3]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:3]/GPUFanControlState=1

nvidia-settings -a [fan:3]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:4]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:4]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:4]/GPUFanControlState=1

nvidia-settings -a [fan:4]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:5]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:5]/GPUMemoryTransferRateOffset[3]=800

nvidia-settings -a [gpu:5]/GPUFanControlState=1

nvidia-settings -a [fan:5]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:6]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:6]/GPUMemoryTransferRateOffset[3]=900

nvidia-settings -a [gpu:6]/GPUFanControlState=1

nvidia-settings -a [fan:6]/GPUTargetFanSpeed=80

while true; # This will loop your miner even if you kill -9 ethminer it will start again after do

To stop just CTRL+C or what ever you want =)

do /home/m1/Miner/ethminer -U -S eth-eu2.nanopool.org:9999 -O 0xb4983146f0047d87c63b5fdb3ef9e2bee4557ea3.M1/vhosted@gmail.com done
                   Thats was not so hard, the main deal is up to go !!!
                  While our miner script is working  we will run another one
                                         Script for monitoring
!/bin/sh

-i 5 number GPU to monit

gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu --format=csv,noheader,nounits

while true; #Loops :=)) do while [ $gpu -gt 50 ] do gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu --format=csv,noheader,nounits echo "GPU load $gpu" echo "All good $(date) GPU load $gpu No errors" sleep 10 done if [ $gpu -lt 40 ] then killall -9 ethminer echo "Restart Miner GPU load $gpu $(date) error" echo "Restart Miner $(date) error" >> /home/m1/Miner/ethminer.log sleep 60 gpu=nvidia-smi -i 5 --query-gpu=utilization.gpu --format=csv,noheader,nounits fi; done

Thats it. Finished it esterday. I think it can be smaller. But nothing need to install, compile etc. All night i tested my GPUs with OC and power -+ very fast to test cloks and + tail -f /var/log/kern.log | grep nvrm to see what gpu couesd an error without long farm stop. If it will help you. I like good coffe )) 0xb4983146f0047d87c63b5fdb3ef9e2bee4557ea3 Hosted

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ethereum-mining/ethminer/issues/72#issuecomment-353278352, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-DBPLj3pjf2XpPInPCCcBT1yCJI40aks5tCgjFgaJpZM4OGt11 .

H05ted commented 6 years ago

Hi joantune For fan control run nvidia-settings -a GPUFanControlState=1 then nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80 of GPU0 fan spped try coolbits 38 i was seeking it to work too. My cards not so good as i want but ~31.7Mhs ~500Sol each, testing is not finished yet.

neskoc commented 6 years ago

I'm just finished with setting up second rig (Asus new miner MB - max 19 GPU's) with 3 Asus GTX 1060 OC/1 non OC (old batch). H05ted inspired me to dive into linux again and it is up and running (I will bi needing some adjustments so miner is automatically started after reboot but everything else is working great. I'll post some scripts later but I would like to make some suggestion to H05ted xorg.conf I was struggling for hours to be able to change nvidia settings for card>0 and I've just came across this command: sudo nvidia-xconfig -a --cool-bits=13 --allow-empty-initial-configuration that makes perfect xorg.conf without need for any editing afterwards ... and bonus is that nvidia-settings is now working for all cards. I'm not sure whether it is related to the installing (apt install) xserver-xorg-dev prior to reboot (running nvidia-xconfig complained about missing xorg-server so I installed it). Anyhow it is working now. As I wrote I'll post some updates in the future.

xstead commented 6 years ago

Ethereum Miner Monitor released - v1.0.2 - FREE!

This is a python application for monitoring linux based ethereum miners and keep alive the miner in 24/7. If you have a linux based mining rig, but don't have monitoring system, you can use this standalone script to keep your miner always running without manual checks.

The application is continuously checking the 'ethminer' process is running and the current GPUs utilization average value.

Script can restart the ethminer, or reboot the system.

The script doesn't need any extra package/module of python, just pure python3. You can use virtualenv too.

The current version was tested on Ubuntu 16.04.3 LTS (xenial), with GeForce GTX 1070 Ti && AMD Radeon R9 290X cards.

Added AMD Utilization query support!

Download: https://github.com/xstead/ethereum-miner-monitor

ethereum-mining / ethminer

Error CUDA mining: an illegal memory access was encountered #72

Powershell Solution for CUDA Crashes

Instructions:

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 387.34 (buildmeister@swio-display-x64-rhel04-15) Tue Nov 21 03:31:45 PST 2017

generated from default

generated from default

!/bin/sh

nvidia-settings -a GPUFanControlState=0

nvidia-settings -a GPUGraphicsClockOffset[3]=-100

nvidia-settings -a GPUMemoryTransferRateOffset[3]=1200

nvidia-smi -pm 1

nvidia-smi -pl 155

nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1200

nvidia-settings -a [gpu:0]/GPUFanControlState=1

nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:1]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:1]/GPUMemoryTransferRateOffset[3]=1450

nvidia-settings -a [gpu:1]/GPUFanControlState=1

nvidia-settings -a [fan:1]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:2]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:2]/GPUMemoryTransferRateOffset[3]=1150

nvidia-settings -a [gpu:2]/GPUFanControlState=1

nvidia-settings -a [fan:2]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:3]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:3]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:3]/GPUFanControlState=1

nvidia-settings -a [fan:3]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:4]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:4]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:4]/GPUFanControlState=1

nvidia-settings -a [fan:4]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:5]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:5]/GPUMemoryTransferRateOffset[3]=800

nvidia-settings -a [gpu:5]/GPUFanControlState=1

nvidia-settings -a [fan:5]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:6]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:6]/GPUMemoryTransferRateOffset[3]=900

nvidia-settings -a [gpu:6]/GPUFanControlState=1

nvidia-settings -a [fan:6]/GPUTargetFanSpeed=80

To stop just CTRL+C or what ever you want =)

!/bin/sh

-i 5 number GPU to monit

!!! nvidia coolbits must be enabled if you want OC settings to work. Mine is 13 tested on 381 and 387 drivers, emulated monitor for each card neded my nvidia-xconfig conf for 7 GPU, edid.bin find in google, i made mine from AOC 23 mon

generated from default

generated from default

Section "Screen" Identifier "Screen6" Device "Device6" Monitor "Monitor6" DefaultDepth 24 Option "AllowEmptyInitialConfiguration" "True" Option "Coolbits" "13" SubSection "Display" Depth 24 EndSubSection EndSection

!/bin/sh

nvidia-settings -a GPUFanControlState=0

nvidia-settings -a GPUGraphicsClockOffset[3]=-100

nvidia-settings -a GPUMemoryTransferRateOffset[3]=1200

nvidia-smi -pm 1

nvidia-smi -pl 155

nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1200

nvidia-settings -a [gpu:0]/GPUFanControlState=1

nvidia-settings -a [fan:0]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:1]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:1]/GPUMemoryTransferRateOffset[3]=1450

nvidia-settings -a [gpu:1]/GPUFanControlState=1

nvidia-settings -a [fan:1]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:2]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:2]/GPUMemoryTransferRateOffset[3]=1150

nvidia-settings -a [gpu:2]/GPUFanControlState=1

nvidia-settings -a [fan:2]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:3]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:3]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:3]/GPUFanControlState=1

nvidia-settings -a [fan:3]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:4]/GPUGraphicsClockOffset[3]=-150

nvidia-settings -a [gpu:4]/GPUMemoryTransferRateOffset[3]=1050

nvidia-settings -a [gpu:4]/GPUFanControlState=1

nvidia-settings -a [fan:4]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:5]/GPUGraphicsClockOffset[3]=-100

nvidia-settings -a [gpu:5]/GPUMemoryTransferRateOffset[3]=800

nvidia-settings -a [gpu:5]/GPUFanControlState=1

nvidia-settings -a [fan:5]/GPUTargetFanSpeed=80

nvidia-settings -a [gpu:6]/GPUGraphicsClockOffset[3]=-100