949f45ac / xmr-stak-hip

Monero HIP miner with custom optimizations – Development continues at xmrig-HIP
GNU General Public License v3.0
9 stars 1 forks source link

Issues running multiple Vega cards #2

Closed gurupras closed 6 years ago

gurupras commented 6 years ago

I'm having some trouble starting up multiple Vega64 cards. I have a rig with 5 Vega64s and when I try to run the xmr-stak-nvidia program it errors out stating:

GPU 0: hipErrorMemoryAllocation
/home/minecraft/crypto/xmr-stak-hip/hip_code/cuda_extra.cu line 210
[2018-10-03 21:38:06] : Difficulty changed. Now: 5000.
[2018-10-03 21:38:06] : New block detected.

I'm trying to run with 16 threads and 224 blocks. The system itself has only 4GB RAM but 128GB of swap.

Let me know if you need any other information.

949f45ac commented 6 years ago
gurupras commented 6 years ago

Yes, the config works when I have only 1 GPU attached!

"gpu_threads_conf" : [
  { "index" : 0, "threads" : 16, "blocks" : 224, "bfactor" : 0, "bsleep" :  100, "affine_to_cpu" : false},
],

"use_tls" : false,
"tls_secure_algo" : true,
"tls_fingerprint" : "",

"pool_address" : "pool.supportxmr.com:3333",
"wallet_address" : "42Z...",
"pool_password" : "x",

"call_timeout" : 10,
"retry_time" : 10,
"giveup_limit" : 0,
"verbose_level" : 3,
"h_print_time" : 60,
"output_file" : "",
"httpd_port" : 0,
"prefer_ipv4" : true,
949f45ac commented 6 years ago

And when you run with more than 1, do you update the index in each config row accordingly to 1, 2, 3 etc.? Have you validated with rocm-info that the indices are all indeed Vega cards and there is not an iGPU in between them or anything? Have you tried running a seperate miner instance for each card? How many cards can you run at once until it fails?

gurupras commented 6 years ago

And when you run with more than 1, do you update the index in each config row accordingly to 1, 2, 3 etc.?

Yes, I realize that that index refers to OpenCL device index. I create a configuration like this:

"gpu_threads_conf" : [
  { "index" : 0, "threads" : 16, "blocks" : 224, "bfactor" : 0, "bsleep" :  100, "affine_to_cpu" : false},
  { "index" : 1, "threads" : 16, "blocks" : 224, "bfactor" : 0, "bsleep" :  100, "affine_to_cpu" : false},
]

Have you validated with rocm-info that the indices are all indeed Vega cards and there is not an iGPU in between them or anything?

rocminfo does state iGPU as Agent-1. Not entirely sure how to skip it, but I'm sure if I try enough permutations with the index field, I should be able to find a configuration that skips the iGPU. Here's the output of rocminfo:

root@minecraft:/# rocminfo | grep -E "Agent|Name:"
HSA Agents               
Agent 1                  
  Name:                    Intel(R) Celeron(R) CPU G3920 @ 2.90GHz
  Vendor Name:             CPU                                
Agent 2                  
  Name:                    gfx900                             
  Vendor Name:             AMD                                
      Name:                    amdgcn-amd-amdhsa--gfx900          
Agent 3                  
  Name:                    gfx900                             
  Vendor Name:             AMD                                
      Name:                    amdgcn-amd-amdhsa--gfx900          
Agent 4                  
  Name:                    gfx900                             
  Vendor Name:             AMD                                
      Name:                    amdgcn-amd-amdhsa--gfx900          
Agent 5                  
  Name:                    gfx900                             
  Vendor Name:             AMD                                
      Name:                    amdgcn-amd-amdhsa--gfx900          
Agent 6                  
  Name:                    gfx900                             
  Vendor Name:             AMD                                
      Name:                    amdgcn-amd-amdhsa--gfx900          

and clinfo:

root@minecraft:/# clinfo -l
Platform #0: AMD Accelerated Parallel Processing
 +-- Device #0: gfx900
 +-- Device #1: gfx900
 +-- Device #2: gfx900
 +-- Device #3: gfx900
 `-- Device #4: gfx900

Have you tried running a seperate miner instance for each card?

I had not tried to run a separate miner on each thread, but I tested it now with all 5 GPUs connected and it doesn't work. Each process fails with the exact same error.

How many cards can you run at once until it fails?

I have been successful in running 2 cards with block: 224. Anything more and I start seeing these errors.
On a side note, I can run all 5 GPUs, but only with the following configuration:

"gpu_threads_conf" : [
  { "index" : 0, "threads" : 16, "blocks" : 16, "bfactor" : 0, "bsleep" :  100, "affine_to_cpu" : false},
  { "index" : 1, "threads" : 16, "blocks" : 16, "bfactor" : 0, "bsleep" :  100, "affine_to_cpu" : false},
  { "index" : 2, "threads" : 16, "blocks" : 16, "bfactor" : 0, "bsleep" :  100, "affine_to_cpu" : false},
  { "index" : 3, "threads" : 16, "blocks" : 16, "bfactor" : 0, "bsleep" :  100, "affine_to_cpu" : false},
  { "index" : 4, "threads" : 16, "blocks" : 16, "bfactor" : 0, "bsleep" :  100, "affine_to_cpu" : false},
],

Obviously, this produces some absolutely terrible numbers for a bunch of Vegas

RESULT REPORT
Difficulty       : 73501
Good results     : 28 / 28 (100.0 %)
Avg result time  : 2.8 sec
Pool-side hashes : 277002

Top 10 best results found:
|  0 |           442355 |  1 |           164047 |
|  2 |            75308 |  3 |            68372 |
|  4 |            47053 |  5 |            45503 |
|  6 |            27028 |  7 |            21307 |
|  8 |            21077 |  9 |            16650 |

Error details:
Yay! No errors.
Run for startnonce 50341120 with target 0000E44222594BB7 over.
Starting run for nonce 50341376
HASHRATE REPORT
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 367.2 | 372.4 | (na) |  1 | 380.0 | 381.3 | (na) |
|  2 | 381.6 | 381.2 | (na) |  3 | 372.3 | 376.2 | (na) |
|  4 | 380.6 | 381.7 | (na) |
-----------------------------------------------------
Totals:   1881.6 1892.9 (na) H/s
Highest:  1910.9 H/s
949f45ac commented 6 years ago

Ok tbh I have no idea what the problem is here, the other guy was able to run four Vega FE just fine.

rhlug commented 6 years ago

I'm running 6 on this config.. maybe bsleep 0

"gpu_threads_conf" : [
 { "index" : 0, "threads" : 8, "blocks" : 448, "bfactor" : 0, "bsleep" :  0, "affine_to_cpu" : false },
 { "index" : 1, "threads" : 8, "blocks" : 448, "bfactor" : 0, "bsleep" :  0, "affine_to_cpu" : false },
 { "index" : 2, "threads" : 8, "blocks" : 448, "bfactor" : 0, "bsleep" :  0, "affine_to_cpu" : false },
 { "index" : 3, "threads" : 8, "blocks" : 448, "bfactor" : 0, "bsleep" :  0, "affine_to_cpu" : false },
 { "index" : 4, "threads" : 8, "blocks" : 448, "bfactor" : 0, "bsleep" :  0, "affine_to_cpu" : false },
 { "index" : 5, "threads" : 8, "blocks" : 448, "bfactor" : 0, "bsleep" :  0, "affine_to_cpu" : false },
],
HASHRATE REPORT
| ID |  10s |  60s |  15m | ID |  10s |  60s |  15m |
|  0 | 1709.5 | 1708.9 | (na) |  1 | 1870.1 | 1870.1 | (na) |
|  2 | 1732.7 | 1732.4 | (na) |  3 | 1699.8 | 1700.0 | (na) |
|  4 | 1873.7 | 1873.8 | (na) |  5 | 1768.6 | 1768.9 | (na) |
-----------------------------------------------------
Totals:   10654.3 10654.0 (na) H/s
Highest:  10659.0 H/s