fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.79k forks source link

cuda & openCL blocking linux… making linux unusable #1083

Closed aotto1968 closed 6 years ago

aotto1968 commented 6 years ago

Hi, my first steps with GPU mining…

I only use GPU mining on Nvidia… run the CUDA and later the (modified) openCL code…

>> first info: both hash-speed are equal… ~305 H/s

and both show the same behavior…

my FIRST thought was… that the GPU is in overload… so i reduce the load to…

gpu_threads_conf" :
[
  // gpu: GeForce GTX 1050 Ti architecture: 61 
  //      memory: 3637/4030 MiB
  //      smx: 6 
  { "index" : 0,   
    "threads" : 8, "blocks" : 1, 
    "bfactor" : 2, "bsleep" :  0,                                                             
    "affine_to_cpu" : 1, "sync_mode" : 3,                                                     
  },    
],          

from original:

"gpu_threads_conf" :
[
  // gpu: GeForce GTX 1050 Ti architecture: 61
  //      memory: 3637/4030 MiB
  //      smx: 6
  { "index" : 0,
    "threads" : 56, "blocks" : 18,
    "bfactor" : 2, "bsleep" :  0,
    "affine_to_cpu" : false, "sync_mode" : 3,
  },
],

→ but still the same behavior… right now… i cant NOT mine and work together.

other example from the CUDA installation working fine…

========================================================================

Please provide as much as possible information to reproduce the issue.

Basic information

Issue with the execution

NO compiling issues

AMD OpenCl issue

let us focus on CUDA

Stability issue

georgi-id commented 6 years ago

If you plan on using the system while mining, I'd advise you to up "bfactor" and "bsleep".

aotto1968 commented 6 years ago

WOW !! the new code works :-)

[2018-02-16 20:18:38] : Start mining: MONERO
[2018-02-16 20:18:38] : Starting NVIDIA GPU thread 0, affinity: 1.
[2018-02-16 20:18:38] : hwloc: memory pinned
[2018-02-16 20:18:38] : Fast-connecting to XXXXX ...
[2018-02-16 20:18:38] : Pool XXXX connected. Logging in...
[2018-02-16 20:18:38] : Difficulty changed. Now: 5000.
[2018-02-16 20:18:38] : Pool logged in.
HASHRATE REPORT - NVIDIA
| ID |    10s |    60s |    15m |
|  0 |   (na) |   (na) |   (na) |
---------------------------
Totals:     (na)   (na)   (na) H/s
Highest:     0.0 H/s
HASHRATE REPORT - NVIDIA
| ID |    10s |    60s |    15m |
|  0 |   (na) |   (na) |   (na) |
---------------------------
Totals:     (na)   (na)   (na) H/s
Highest:     0.0 H/s
HASHRATE REPORT - NVIDIA
| ID |    10s |    60s |    15m |
|  0 | 170669.5 |   (na) |   (na) |
---------------------------
Totals:   170669.5   (na)   (na) H/s
Highest:  175697.6 H/s
HASHRATE REPORT - NVIDIA
| ID |    10s |    60s |    15m |
|  0 | 164798.8 |   (na) |   (na) |
---------------------------
Totals:   164798.8   (na)   (na) H/s
Highest:  175697.6 H/s
aotto1968 commented 6 years ago

OH NO !!

the following error happen ONLY with bfactor "20" doesn't matter was threads, blocks or bsleep has…

terminate called after throwing an instance of 'std::runtime_error'
  what():  [CUDA] Error: an illegal memory access was encountered
Abgebrochen (Speicherabzug geschrieben)
psychocrypt commented 6 years ago

bfactor 20 is not allowed, everything over 10 makes no sense

aotto1968 commented 6 years ago

I tested my card… bfactor < 10 ?? → I need > 14 if the Display should be useable…

bsleep=10

"bfactor" : 16
212 H/s → threads=80 … blocks=15 
215 H/s → threads=80 … blocks=17 bsleep=20 
228 H/s → threads=84 … blocks=17 bsleep=20 → CRIT
248 H/s → threads=96 … blocks=16 bsleep=20 → CRIT
227 H/s → threads=88 … blocks=16 bsleep=20 
228 H/s → threads=80 … blocks=16 → CRIT
204 H/s → threads=72 … blocks=16 
182 H/s → threads=64 … blocks=16 
160 H/s → threads=56 … blocks=16 

"bfactor" : 14
193 H/s → threads=18 … blocks=15 
186 H/s → threads=15 … blocks=17 
196 H/s → threads=16 … blocks=17 → CRIT
202 H/s → threads=18 … blocks=16 → CRIT
196 H/s → threads=17 … blocks=16 
185 H/s → threads=16 … blocks=16 
174 H/s → threads=15 … blocks=16 
118 H/s → threads=10 … blocks=16 
200 H/s → threads=20 … blocks=14 → CRIT
215 H/s → threads=20 … blocks=15 → CRIT

"bfactor" : 15
220 H/s → threads=34 … blocks=18 → CRIT
222 H/s → threads=32 … blocks=18 → CRIT
197 H/s → threads=30 … blocks=18 
208 H/s → threads=34 … blocks=17 
230 H/s → threads=40 … blocks=16 → CRIT
219 H/s → threads=38 … blocks=16 → CRIT
207 H/s → threads=36 … blocks=16 
195 H/s → threads=36 … blocks=15 
156 H/s → threads=36 … blocks=12 
aotto1968 commented 6 years ago

done some research…

ITERATIONS = #define MONERO_ITER 0x80000

code #1: const int batchsize = ITERATIONS >> bfactor 0x80000 = 0b10000000000000000000 → 20 digits → so you can MAX shift 20-1=19 digits.

code #2: const int batchsize = (ITERATIONS * 2) >> ( 2 + bfactor ); 0x100000 = 0b100000000000000000000 → 21 digits → so you can MAX shift 21-1-2=18 digits.

psychocrypt commented 6 years ago

If you increase bfactor to much than you decrease the hash rate. A better way is to reduce the number of threads.

aotto1968 commented 6 years ago

I have done more research… and the best setup right now is…

"gpu_threads_conf" :
[
  // gpu: GeForce GTX 1050 Ti architecture: 61
  //      memory: 3637/4030 MiB
  //      smx: 6
  { "index" : 0,
    "threads" : 32, "blocks" : 12,
    "bfactor" : 13, "bsleep" : 110,
    "affine_to_cpu" : 1, "sync_mode" : 0,
  },
],

this setup has ~250 H/s and the desktop still be useable… I never get more then 305 H/s… everything LOWER the bfactor 12 does NOT change the results…

It seems the Memory-Speed is the real bottelneck… and only optimization is Power-Usage… Try to optimize the code… special the cryptonight_core_gpu_hash code… I would like to "experiment" with the block4 setup… but failed to unroll the CUDA code with hardwire value shift :-(