aotto1968 commented 6 years ago

Hi, my first steps with GPU mining…

I only use GPU mining on Nvidia… run the CUDA and later the (modified) openCL code…

Nvidia support both, CUDA and openCL

>> first info: both hash-speed are equal… ~305 H/s

and both show the same behavior…

after start of xmr-stak… the linux become permanent slower… and after ~60-120 sec…
linux is un-usable… !!!

my FIRST thought was… that the GPU is in overload… so i reduce the load to…

gpu_threads_conf" :
[
  // gpu: GeForce GTX 1050 Ti architecture: 61 
  //      memory: 3637/4030 MiB
  //      smx: 6 
  { "index" : 0,   
    "threads" : 8, "blocks" : 1, 
    "bfactor" : 2, "bsleep" :  0,                                                             
    "affine_to_cpu" : 1, "sync_mode" : 3,                                                     
  },    
],

from original:

"gpu_threads_conf" :
[
  // gpu: GeForce GTX 1050 Ti architecture: 61
  //      memory: 3637/4030 MiB
  //      smx: 6
  { "index" : 0,
    "threads" : 56, "blocks" : 18,
    "bfactor" : 2, "bsleep" :  0,
    "affine_to_cpu" : false, "sync_mode" : 3,
  },
],

→ but still the same behavior… right now… i cant NOT mine and work together.

other example from the CUDA installation working fine…

========================================================================

Please provide as much as possible information to reproduce the issue.

Basic information

Type of the CPU. Intel(R) Xeon(R) CPU E3-1275 V2 @ 3.50GHz
Type of the GPU (if you try to miner with the GPU). GeForce GTX 1050 Ti architecture: 61

Issue with the execution

NO compiling issues

AMD OpenCl issue

let us focus on CUDA

Stability issue

Is the CPU or GPU overclocked? → no i only use the "default"
Is the Main memory of the CPU or GPU undervolted? → how I can check this ?

georgi-id commented 6 years ago

If you plan on using the system while mining, I'd advise you to up "bfactor" and "bsleep".

aotto1968 commented 6 years ago

WOW !! the new code works :-)

[2018-02-16 20:18:38] : Start mining: MONERO
[2018-02-16 20:18:38] : Starting NVIDIA GPU thread 0, affinity: 1.
[2018-02-16 20:18:38] : hwloc: memory pinned
[2018-02-16 20:18:38] : Fast-connecting to XXXXX ...
[2018-02-16 20:18:38] : Pool XXXX connected. Logging in...
[2018-02-16 20:18:38] : Difficulty changed. Now: 5000.
[2018-02-16 20:18:38] : Pool logged in.
HASHRATE REPORT - NVIDIA
| ID |    10s |    60s |    15m |
|  0 |   (na) |   (na) |   (na) |
---------------------------
Totals:     (na)   (na)   (na) H/s
Highest:     0.0 H/s
HASHRATE REPORT - NVIDIA
| ID |    10s |    60s |    15m |
|  0 |   (na) |   (na) |   (na) |
---------------------------
Totals:     (na)   (na)   (na) H/s
Highest:     0.0 H/s
HASHRATE REPORT - NVIDIA
| ID |    10s |    60s |    15m |
|  0 | 170669.5 |   (na) |   (na) |
---------------------------
Totals:   170669.5   (na)   (na) H/s
Highest:  175697.6 H/s
HASHRATE REPORT - NVIDIA
| ID |    10s |    60s |    15m |
|  0 | 164798.8 |   (na) |   (na) |
---------------------------
Totals:   164798.8   (na)   (na) H/s
Highest:  175697.6 H/s

aotto1968 commented 6 years ago

OH NO !!

the following error happen ONLY with bfactor "20" doesn't matter was threads, blocks or bsleep has…

terminate called after throwing an instance of 'std::runtime_error'
  what():  [CUDA] Error: an illegal memory access was encountered
Abgebrochen (Speicherabzug geschrieben)

psychocrypt commented 6 years ago

bfactor 20 is not allowed, everything over 10 makes no sense

aotto1968 commented 6 years ago

I tested my card… bfactor < 10 ?? → I need > 14 if the Display should be useable…

bsleep=10

"bfactor" : 16
212 H/s → threads=80 … blocks=15 
215 H/s → threads=80 … blocks=17 bsleep=20 
228 H/s → threads=84 … blocks=17 bsleep=20 → CRIT
248 H/s → threads=96 … blocks=16 bsleep=20 → CRIT
227 H/s → threads=88 … blocks=16 bsleep=20 
228 H/s → threads=80 … blocks=16 → CRIT
204 H/s → threads=72 … blocks=16 
182 H/s → threads=64 … blocks=16 
160 H/s → threads=56 … blocks=16 

"bfactor" : 14
193 H/s → threads=18 … blocks=15 
186 H/s → threads=15 … blocks=17 
196 H/s → threads=16 … blocks=17 → CRIT
202 H/s → threads=18 … blocks=16 → CRIT
196 H/s → threads=17 … blocks=16 
185 H/s → threads=16 … blocks=16 
174 H/s → threads=15 … blocks=16 
118 H/s → threads=10 … blocks=16 
200 H/s → threads=20 … blocks=14 → CRIT
215 H/s → threads=20 … blocks=15 → CRIT

"bfactor" : 15
220 H/s → threads=34 … blocks=18 → CRIT
222 H/s → threads=32 … blocks=18 → CRIT
197 H/s → threads=30 … blocks=18 
208 H/s → threads=34 … blocks=17 
230 H/s → threads=40 … blocks=16 → CRIT
219 H/s → threads=38 … blocks=16 → CRIT
207 H/s → threads=36 … blocks=16 
195 H/s → threads=36 … blocks=15 
156 H/s → threads=36 … blocks=12

aotto1968 commented 6 years ago

done some research…

ITERATIONS = #define MONERO_ITER 0x80000

code #1: const int batchsize = ITERATIONS >> bfactor 0x80000 = 0b10000000000000000000 → 20 digits → so you can MAX shift 20-1=19 digits.

code #2: const int batchsize = (ITERATIONS * 2) >> ( 2 + bfactor ); 0x100000 = 0b100000000000000000000 → 21 digits → so you can MAX shift 21-1-2=18 digits.

psychocrypt commented 6 years ago

If you increase bfactor to much than you decrease the hash rate. A better way is to reduce the number of threads.

aotto1968 commented 6 years ago

I have done more research… and the best setup right now is…

"gpu_threads_conf" :
[
  // gpu: GeForce GTX 1050 Ti architecture: 61
  //      memory: 3637/4030 MiB
  //      smx: 6
  { "index" : 0,
    "threads" : 32, "blocks" : 12,
    "bfactor" : 13, "bsleep" : 110,
    "affine_to_cpu" : 1, "sync_mode" : 0,
  },
],

this setup has ~250 H/s and the desktop still be useable… I never get more then 305 H/s… everything LOWER the bfactor 12 does NOT change the results…

It seems the Memory-Speed is the real bottelneck… and only optimization is Power-Usage… Try to optimize the code… special the cryptonight_core_gpu_hash code… I would like to "experiment" with the block4 setup… but failed to unroll the CUDA code with hardwire value shift :-(

fireice-uk / xmr-stak

cuda & openCL blocking linux… making linux unusable #1083

Basic information

Issue with the execution

AMD OpenCl issue

Stability issue