fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.06k stars 1.79k forks source link

After fork Monero (cryptonight_r) - ALL GPU slow down almost to zero #2304

Closed minzak closed 5 years ago

minzak commented 5 years ago

Before fork of Monero to new cryptonight_r - all will be fine, with latest v2.10 on master branch After 1788000 block - also will fine, but after some times miner not normal works. (

I work with https://www.supportxmr.com pool.

I know that at this moment all was resolved with display result in reports - https://github.com/fireice-uk/xmr-stak/issues/1976 But now behaviour like before, see screen below.

But for now - i see that same config of my cards not worked, i also delete amd.txt and it was recreated, and it the same. But with slow result - between 0 and 100.

No any updates was in made in OS. clinfo, amdcovc, amdmeminfo, ohgodatool - get normal result. I Not understand what is wrong?

-------------------------------------------------------------------
[2019-03-10 13:14:12] : Mining coin: cryptonight_r
[2019-03-10 13:14:12] : Compiling code and initializing GPUs. This will take a while...
[2019-03-10 13:14:12] : Device 0 work size 8 / 32.
[2019-03-10 13:14:12] : OpenCL device 0 - Precompiled code /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin not found. Compiling ...
[2019-03-10 13:14:18] : OpenCL device 0 - Precompiled code stored in file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:18] : OpenCL device 0 - Precompiled code /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin not found. Compiling ...
[2019-03-10 13:14:24] : OpenCL device 0 - Precompiled code stored in file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:24] : Device 0 work size 8 / 32.
[2019-03-10 13:14:24] : OpenCL device 0 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:24] : OpenCL device 0 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:24] : Device 1 work size 8 / 32.
[2019-03-10 13:14:24] : OpenCL device 1 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:24] : OpenCL device 1 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:24] : Device 1 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 1 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 1 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 2 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 2 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 2 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 2 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 2 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 2 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 3 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 3 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 3 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 3 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 3 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 3 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 4 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 4 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 4 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 4 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 4 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 4 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 5 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 5 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 5 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 5 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 5 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 5 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 6 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 6 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 6 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 6 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 6 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 6 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 7 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 7 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 7 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 7 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 7 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 7 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 8 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 8 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 8 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 8 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 8 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 8 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:25] : Device 9 work size 8 / 32.
[2019-03-10 13:14:25] : OpenCL device 9 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:25] : OpenCL device 9 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:26] : Device 9 work size 8 / 32.
[2019-03-10 13:14:26] : OpenCL device 9 - Load precompiled code from file /root/.openclcache/72d994646533be19c8c35e62ba992ebaa937c2ba9dad205d3a91eba482443849.openclbin
[2019-03-10 13:14:26] : OpenCL device 9 - Load precompiled code from file /root/.openclcache/0bd9a732e8c5b39504441e9eb25895c8a3dda66c708ea6a955c828fc48530f8d.openclbin
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 0, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 1, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 2, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 3, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 4, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 5, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 6, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 7, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 8, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 9, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 10, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 11, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 12, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 13, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 14, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 15, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 16, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 17, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 18, no affinity.
[2019-03-10 13:14:26] : Starting AMD GPU (OpenCL) thread 19, no affinity.
[2019-03-10 13:14:26] : Starting 1x thread, affinity: 0.
[2019-03-10 13:14:26] : hwloc: memory pinned
[2019-03-10 13:14:26] : Starting 1x thread, affinity: 1.
[2019-03-10 13:14:26] : hwloc: memory pinned
[2019-03-10 13:14:26] : Starting 1x thread, affinity: 2.
[2019-03-10 13:14:26] : hwloc: memory pinned
[2019-03-10 13:14:26] : Fast-connecting to pool.supportxmr.com:7777 pool ...
[2019-03-10 13:14:26] : Pool pool.supportxmr.com:7777 connected. Logging in...
[2019-03-10 13:14:26] : Difficulty changed. Now: 200007.
[2019-03-10 13:14:26] : Pool logged in.
[2019-03-10 13:14:26] : enable cryptonight_r asm 'intel_avx' cpu's
[2019-03-10 13:14:26] : enable cryptonight_r asm 'intel_avx' cpu's
[2019-03-10 13:14:26] : enable cryptonight_r asm 'intel_avx' cpu's
[2019-03-10 13:14:40] : OpenCL Interleave 1|0: 3740/10579.00 ms - 40.0
[2019-03-10 13:14:41] : OpenCL Interleave 2|0: 3788/10706.09 ms - 40.0
[2019-03-10 13:14:42] : OpenCL Interleave 3|0: 3834/10807.59 ms - 40.0
[2019-03-10 13:14:46] : OpenCL Interleave 6|1: 3767/10650.00 ms - 40.0
[2019-03-10 13:14:46] : OpenCL Interleave 1|1: 1041/10251.09 ms - 40.1
[2019-03-10 13:14:47] : OpenCL Interleave 7|0: 3754/10617.89 ms - 40.0
[2019-03-10 13:14:48] : OpenCL Interleave 2|1: 1185/10360.58 ms - 40.1
[2019-03-10 13:14:48] : OpenCL Interleave 4|0: 5799/15282.00 ms - 40.0
[2019-03-10 13:14:48] : OpenCL Interleave 8|0: 3729/10553.00 ms - 40.0
[2019-03-10 13:14:49] : OpenCL Interleave 3|1: 1349/10442.93 ms - 40.1
[2019-03-10 13:14:49] : OpenCL Interleave 5|0: 5820/15319.39 ms - 40.0
[2019-03-10 13:14:49] : OpenCL Interleave 9|0: 3602/10330.79 ms - 40.0
[2019-03-10 13:14:53] : OpenCL Interleave 6|0: 1098/10314.69 ms - 40.1
[2019-03-10 13:14:54] : OpenCL Interleave 7|1: 1089/10284.30 ms - 40.1
[2019-03-10 13:14:55] : OpenCL Interleave 8|1: 1034/10226.39 ms - 40.1
[2019-03-10 13:14:56] : OpenCL Interleave 9|1: 813/10031.81 ms - 40.1
[2019-03-10 13:14:58] : OpenCL Interleave 4|1: 2039/14752.59 ms - 40.1
[2019-03-10 13:14:59] : OpenCL Interleave 5|1: 2088/14784.15 ms - 40.1
[2019-03-10 13:15:16] : OpenCL Interleave 0|0: 3068/8967.08 ms - 40.0
[2019-03-10 13:15:22] : OpenCL Interleave 0|1: 751/8702.77 ms - 40.1
[2019-03-10 13:15:22] : Result accepted by the pool.
[2019-03-10 13:15:33] : OpenCL Interleave 0|1: 2121/9552.67 ms - 40.2
[2019-03-10 13:15:34] : Result accepted by the pool.
[2019-03-10 13:15:38] : OpenCL Interleave 0|0: 1100/9243.90 ms - 40.2
[2019-03-10 13:15:49] : OpenCL Interleave 0|0: 2039/9924.25 ms - 40.2
[2019-03-10 13:15:51] : OpenCL Interleave 1|0: 1052/10030.69 ms - 40.1
[2019-03-10 13:15:52] : OpenCL Interleave 4|0: 159/15261.29 ms - 40.1
HASHRATE REPORT - CPU
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |   60.2 |   60.4 |   (na) |  1 |   60.9 |   61.1 |   (na) |
|  2 |   61.0 |   61.1 |   (na) |
Totals (CPU):   182.2  182.7    0.0 H/s
-----------------------------------------------------------------
HASHRATE REPORT - AMD
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |   (na) |   89.7 |   (na) |  1 |   (na) |   97.9 |   (na) |
|  2 |   (na) |  101.9 |   (na) |  3 |   (na) |   97.6 |   (na) |
|  4 |   (na) |   97.7 |   (na) |  5 |   (na) |   96.9 |   (na) |
|  6 |   (na) |   97.3 |   (na) |  7 |   (na) |   97.7 |   (na) |
|  8 |   (na) |   67.6 |   (na) |  9 |   (na) |   65.2 |   (na) |
| 10 |   (na) |   66.4 |   (na) | 11 |   (na) |   66.5 |   (na) |
| 12 |   (na) |   97.3 |   (na) | 13 |   (na) |   97.7 |   (na) |
| 14 |   (na) |   97.5 |   (na) | 15 |   (na) |   97.4 |   (na) |
| 16 |   (na) |   97.7 |   (na) | 17 |   (na) |   97.6 |   (na) |
| 18 |   (na) |   97.7 |   (na) | 19 |   (na) |   97.1 |   (na) |
Totals (AMD):     0.0 1823.3    0.0 H/s
-----------------------------------------------------------------
Totals (ALL):    182.2 2006.0    0.0 H/s
Highest:     0.0 H/s
-----------------------------------------------------------------

err err1

root@ferma:/opt/xmr-stak# cat amd.txt
// generated by xmr-stak/2.10.0/56d2770/master/lin/amd-cpu/0

/*
 * GPU configuration. You should play around with intensity and worksize as the fastest settings will vary.
 * index         - GPU index number usually starts from 0
 * intensity     - Number of parallel GPU threads (nothing to do with CPU threads)
 * worksize      - Number of local GPU threads (nothing to do with CPU threads)
 * affine_to_cpu - This will affine the thread to a CPU. This can make a GPU miner play along nicer with a CPU miner.
 * strided_index - switch memory pattern used for the scratchpad memory
 *                 3 = chunked memory, chunk size based on the 'worksize'
 *                     required: intensity must be a multiple of worksize
 *                 2 = chunked memory, chunk size is controlled by 'mem_chunk'
 *                     required: intensity must be a multiple of worksize
 *                 1 or true  = use 16 byte contiguous memory per thread, the next memory block has offset of intensity blocks
 *                             (for cryptonight_v8 and monero it is equal to strided_index = 0)
 *                 0 or false = use a contiguous block of memory per thread
 * mem_chunk     - range 0 to 18: set the number of elements (16byte) per chunk
 *                 this value is only used if 'strided_index' == 2
 *                 element count is computed with the equation: 2 to the power of 'mem_chunk' e.g. 4 means a chunk of 16 elements(256 byte)
 * unroll        - allow to control how often the POW main loop is unrolled; valid range [1;128) - for most OpenCL implementations it must be a power of two.
 * comp_mode     - Compatibility enable/disable the automatic guard around compute kernel which allows
 *                 to use an intensity which is not the multiple of the worksize.
 *                 If you set false and the intensity is not multiple of the worksize the miner can crash:
 *                 in this case set the intensity to a multiple of the worksize or activate comp_mode.
 * interleave    - Controls the starting point in time between two threads on the same GPU device relative to the last started thread.
 *                 This option has only an effect if two compute threads using the same GPU device: valid range [0;100]
 *                 0  = disable thread interleaving
 *                 40 = each working thread waits until 40% of the hash calculation of the previously started thread is finished
 * "gpu_threads_conf" :
 * [
 *     { "index" : 0, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false,
 *       "strided_index" : true, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true,
 *       "interleave" : 40
 *     },
 * ],
 * If you do not wish to mine with your AMD GPU(s) then use:
 * "gpu_threads_conf" :
 * null,
 */

"gpu_threads_conf" : [
  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 0,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 0,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|3795|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 1,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 1,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|3795|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 2,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 2,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|3795|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 3,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 3,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:4026|4048|8053 MiB (used per thread|max per alloc|total free)
  { "index" : 4,    "intensity" : 1000, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 4,    "intensity" : 1000, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:4026|4048|8053 MiB (used per thread|max per alloc|total free)
  { "index" : 5,    "intensity" : 1000, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 5,    "intensity" : 1000, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 6,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 6,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 7,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 7,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 8,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 8,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 9,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 9,    "intensity" : 984, "worksize" : 8,    "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

],

/*
 * number of rounds per intensity performed to find the best intensity settings
 *
 * WARNING: experimental option
 *
 * 0 = disable auto tuning
 * 10 or higher = recommended value if you don't already know the best intensity
 */
"auto_tune" : 0,

/*
 * Platform index. This will be 0 unless you have different OpenCL platform - eg. AMD and Intel.
 */
"platform_index" : 0,

result

Ethorsen commented 5 years ago

I'm having the same issue with AMD gpus on Win10. Since the fork: After some time, the miner is either crashing completely, or it stops working on some gpus but not others. I have to restart the rig completely to fix as just restarting the miner does not revive the affected gpus.

Its the first time I'm having a major stability issue with xmr-stak.

Is the new algo more intensive on gpu or memory? Should we revisit voltage and/or fan settings?

minzak commented 5 years ago

Is the new algo more intensive on gpu or memory?

God question! but i have 4Gb and 8Gb cards.

Should we revisit voltage and/or fan settings?

I use stable voltage - rig never hung. ExecStartPre=-/opt/ohgodatool -i 2 --set-max-power 90 --set-fanspeed 55 --core-state 7 --mem-state 2 --volt-state 11 --core-clock 1430 --mem-clock 2070

Also same behaviour on other miner - https://github.com/xmrig/xmrig-amd/issues/235

Spudz76 commented 5 years ago

Maybe related to #2298 which has not been merged into dev yet (but you could try building from that branch specifically checked out)

Yes that says Vega however it wouldn't be the first time some stuff was weird on RX also versus various drivers

could also be driver generally, the CN-R algo uses compilation on the fly so it's sort of like the initial compile except it also has to compile a small module occasionally to randomize part of the code itself. This likely added more driver mismatch sort of issues so the old tried and true for static algos like v7/v8 may no longer be best due to the new feature being used.

check some of the other recent issues around AMD as I know there were issues and some people found new drivers and you may be able to get specific versions to try from there

minzak commented 5 years ago

Maybe related to #2298 which has not been merged into dev

Hm, i use Linux Debian 4.17 kernel with 17.40 AMD driver Your fix https://github.com/fireice-uk/xmr-stak/pull/2298/commits/a4b8ee4d7281cbb80eec8b1f72c12ec855e2424e only for Windows? I think it not helps for me. I'm also try use dev branch - same poor result.

psychocrypt commented 5 years ago

could you please check if the consumed ram by stak increases over the time. It could be an issue with the jit compile of the cryptonight_r kernel. I will review this part again if I can see any memory leaks.

psychocrypt commented 5 years ago

please post your amd.txt and reduce the intensity until you get hash rates in the 10sec average.

minzak commented 5 years ago

could you please check if the consumed ram by stak increases over the time.

How to do it? to helps see it. in htop, free - all is stable.

I also try set "affine_to_cpu" : 0, and not use CPU thread 0. - hot helps. Also i try with №4 cpu thread (0-3 works on miner, 4 - free)

And i think i can reduce almost to 32-128 - i think it is enough to get 50h per thread. For Avg result time near 10 sec my Difficulty is 10000 Снимок

My intensity was 984, i reduce to 400 and get new picture 1

root@ferma:/opt/xmr-stak# cat amd.txt
// generated by xmr-stak/2.10.0/56d2770/master/lin/amd-cpu/0

/*
 * GPU configuration. You should play around with intensity and worksize as the fastest settings will vary.
 * index         - GPU index number usually starts from 0
 * intensity     - Number of parallel GPU threads (nothing to do with CPU threads)
 * worksize      - Number of local GPU threads (nothing to do with CPU threads)
 * affine_to_cpu - This will affine the thread to a CPU. This can make a GPU miner play along nicer with a CPU miner.
 * strided_index - switch memory pattern used for the scratchpad memory
 *                 3 = chunked memory, chunk size based on the 'worksize'
 *                     required: intensity must be a multiple of worksize
 *                 2 = chunked memory, chunk size is controlled by 'mem_chunk'
 *                     required: intensity must be a multiple of worksize
 *                 1 or true  = use 16 byte contiguous memory per thread, the next memory block has offset of intensity blocks
 *                             (for cryptonight_v8 and monero it is equal to strided_index = 0)
 *                 0 or false = use a contiguous block of memory per thread
 * mem_chunk     - range 0 to 18: set the number of elements (16byte) per chunk
 *                 this value is only used if 'strided_index' == 2
 *                 element count is computed with the equation: 2 to the power of 'mem_chunk' e.g. 4 means a chunk of 16 elements(256 byte)
 * unroll        - allow to control how often the POW main loop is unrolled; valid range [1;128) - for most OpenCL implementations it must be a power of two.
 * comp_mode     - Compatibility enable/disable the automatic guard around compute kernel which allows
 *                 to use an intensity which is not the multiple of the worksize.
 *                 If you set false and the intensity is not multiple of the worksize the miner can crash:
 *                 in this case set the intensity to a multiple of the worksize or activate comp_mode.
 * interleave    - Controls the starting point in time between two threads on the same GPU device relative to the last started thread.
 *                 This option has only an effect if two compute threads using the same GPU device: valid range [0;100]
 *                 0  = disable thread interleaving
 *                 40 = each working thread waits until 40% of the hash calculation of the previously started thread is finished
 * "gpu_threads_conf" :
 * [
 *     { "index" : 0, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : false,
 *       "strided_index" : true, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true,
 *       "interleave" : 40
 *     },
 * ],
 * If you do not wish to mine with your AMD GPU(s) then use:
 * "gpu_threads_conf" :
 * null,
 */

"gpu_threads_conf" : [
  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 0,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 0,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|3795|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 1,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 1,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|3795|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 2,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 2,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|3795|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 3,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 3,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:4026|4048|8053 MiB (used per thread|max per alloc|total free)
  { "index" : 4,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 4,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:4026|4048|8053 MiB (used per thread|max per alloc|total free)
  { "index" : 5,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 5,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 6,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 6,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 7,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 7,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 8,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 8,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

  // gpu: Ellesmere  compute units: 36
  // memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
  { "index" : 9,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },
  { "index" : 9,    "intensity" : 400, "worksize" : 8,    "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2,    "unroll" : 8, "comp_mode" : true, "interleave" : 40  },

],

/*
 * number of rounds per intensity performed to find the best intensity settings
 *
 * WARNING: experimental option
 *
 * 0 = disable auto tuning
 * 10 or higher = recommended value if you don't already know the best intensity
 */
"auto_tune" : 0,

/*
 * Platform index. This will be 0 unless you have different OpenCL platform - eg. AMD and Intel.
 */
"platform_index" : 0,
minzak commented 5 years ago

P.S. On the Graft (cryptonight_v8_reversewaltz) Result is the same, (but little higher) graft

contrabondo commented 5 years ago

First set "platform_index" : 1,

Check and update your amd driver up to 18.12+

I had a similar problem on win10_64 with vega64 and 17.5 driver

minzak commented 5 years ago

First set "platform_index" : 1,

Nope, it is wrong, index - it is mean intel or AMD or Nvidia cards in system, for one rig it is always constant!

Check and update your amd driver up to 18.12+ I had a similar problem on win10_64 with vega64 and 17.5 driver

But my platform linux, and no errors in logs when builds, and some newest 18.40 and 18.50 - not works on Debian (when use dpkg -i *.deb some packet sayt that only for Ubuntu.) Maybe between 18.20-18.30 i can try. And one interested question which latest version works under debian (no locks in deb packets) ??

contrabondo commented 5 years ago

My intel cpu (used for mining) contains GPU (not used for mining) and index=1 work is correctly on win10_64 with vega64, but index=0\

I don't understand debian, but in any case try the latest available driver.

minzak commented 5 years ago

I Checked and as i say, if index=1 - then my miner not founded AMD cards. It is can't be as part of solutions.

minzak commented 5 years ago

At last i solve it! 1) I know that driver where name is contain "ubuntu" can't install some deb packets - and finally impossible to install next driver: amdgpu-pro-18.50-721419-ubuntu-18.04.tar.xz amdgpu-pro-18.50-721418-ubuntu-16.04.tar.xz amdgpu-pro-18.40-697810-ubuntu-18.04.tar.xz amdgpu-pro-18.40-673869-ubuntu-16.04.tar.xz

I'm not sure about amdgpu-pro-18.30-641594.tar.xz

2) I know that core 4.18 not works. Also kernel > 4.15 not works with ROC driver.

And finally i found work mix with kernel 4.9.8 with driver amdgpu-pro-17.50-511655.tar.xz Use it:

wget -O amdgpu-pro-17.50-511655.tar.xz --referer=http://support.amd.com www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.50-511655.tar.xz