Closed minzak closed 5 years ago
I'm having the same issue with AMD gpus on Win10. Since the fork: After some time, the miner is either crashing completely, or it stops working on some gpus but not others. I have to restart the rig completely to fix as just restarting the miner does not revive the affected gpus.
Its the first time I'm having a major stability issue with xmr-stak.
Is the new algo more intensive on gpu or memory? Should we revisit voltage and/or fan settings?
Is the new algo more intensive on gpu or memory?
God question! but i have 4Gb and 8Gb cards.
Should we revisit voltage and/or fan settings?
I use stable voltage - rig never hung. ExecStartPre=-/opt/ohgodatool -i 2 --set-max-power 90 --set-fanspeed 55 --core-state 7 --mem-state 2 --volt-state 11 --core-clock 1430 --mem-clock 2070
Also same behaviour on other miner - https://github.com/xmrig/xmrig-amd/issues/235
Maybe related to #2298 which has not been merged into dev
yet (but you could try building from that branch specifically checked out)
Yes that says Vega however it wouldn't be the first time some stuff was weird on RX also versus various drivers
could also be driver generally, the CN-R algo uses compilation on the fly so it's sort of like the initial compile except it also has to compile a small module occasionally to randomize part of the code itself. This likely added more driver mismatch sort of issues so the old tried and true for static algos like v7/v8 may no longer be best due to the new feature being used.
check some of the other recent issues around AMD as I know there were issues and some people found new drivers and you may be able to get specific versions to try from there
Maybe related to #2298 which has not been merged into
dev
Hm, i use Linux Debian 4.17 kernel with 17.40 AMD driver Your fix https://github.com/fireice-uk/xmr-stak/pull/2298/commits/a4b8ee4d7281cbb80eec8b1f72c12ec855e2424e only for Windows? I think it not helps for me. I'm also try use dev branch - same poor result.
could you please check if the consumed ram by stak increases over the time. It could be an issue with the jit compile of the cryptonight_r kernel. I will review this part again if I can see any memory leaks.
please post your amd.txt and reduce the intensity until you get hash rates in the 10sec average.
could you please check if the consumed ram by stak increases over the time.
How to do it? to helps see it. in htop, free - all is stable.
I also try set "affine_to_cpu" : 0, and not use CPU thread 0. - hot helps. Also i try with №4 cpu thread (0-3 works on miner, 4 - free)
And i think i can reduce almost to 32-128 - i think it is enough to get 50h per thread. For Avg result time near 10 sec my Difficulty is 10000
My intensity was 984, i reduce to 400 and get new picture
root@ferma:/opt/xmr-stak# cat amd.txt
// generated by xmr-stak/2.10.0/56d2770/master/lin/amd-cpu/0
/*
* GPU configuration. You should play around with intensity and worksize as the fastest settings will vary.
* index - GPU index number usually starts from 0
* intensity - Number of parallel GPU threads (nothing to do with CPU threads)
* worksize - Number of local GPU threads (nothing to do with CPU threads)
* affine_to_cpu - This will affine the thread to a CPU. This can make a GPU miner play along nicer with a CPU miner.
* strided_index - switch memory pattern used for the scratchpad memory
* 3 = chunked memory, chunk size based on the 'worksize'
* required: intensity must be a multiple of worksize
* 2 = chunked memory, chunk size is controlled by 'mem_chunk'
* required: intensity must be a multiple of worksize
* 1 or true = use 16 byte contiguous memory per thread, the next memory block has offset of intensity blocks
* (for cryptonight_v8 and monero it is equal to strided_index = 0)
* 0 or false = use a contiguous block of memory per thread
* mem_chunk - range 0 to 18: set the number of elements (16byte) per chunk
* this value is only used if 'strided_index' == 2
* element count is computed with the equation: 2 to the power of 'mem_chunk' e.g. 4 means a chunk of 16 elements(256 byte)
* unroll - allow to control how often the POW main loop is unrolled; valid range [1;128) - for most OpenCL implementations it must be a power of two.
* comp_mode - Compatibility enable/disable the automatic guard around compute kernel which allows
* to use an intensity which is not the multiple of the worksize.
* If you set false and the intensity is not multiple of the worksize the miner can crash:
* in this case set the intensity to a multiple of the worksize or activate comp_mode.
* interleave - Controls the starting point in time between two threads on the same GPU device relative to the last started thread.
* This option has only an effect if two compute threads using the same GPU device: valid range [0;100]
* 0 = disable thread interleaving
* 40 = each working thread waits until 40% of the hash calculation of the previously started thread is finished
* "gpu_threads_conf" :
* [
* { "index" : 0, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : false,
* "strided_index" : true, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true,
* "interleave" : 40
* },
* ],
* If you do not wish to mine with your AMD GPU(s) then use:
* "gpu_threads_conf" :
* null,
*/
"gpu_threads_conf" : [
// gpu: Ellesmere compute units: 36
// memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
{ "index" : 0, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 0, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
// gpu: Ellesmere compute units: 36
// memory:1978|3795|3957 MiB (used per thread|max per alloc|total free)
{ "index" : 1, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 1, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
// gpu: Ellesmere compute units: 36
// memory:1978|3795|3957 MiB (used per thread|max per alloc|total free)
{ "index" : 2, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 2, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
// gpu: Ellesmere compute units: 36
// memory:1978|3795|3957 MiB (used per thread|max per alloc|total free)
{ "index" : 3, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 3, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
// gpu: Ellesmere compute units: 36
// memory:4026|4048|8053 MiB (used per thread|max per alloc|total free)
{ "index" : 4, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 4, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
// gpu: Ellesmere compute units: 36
// memory:4026|4048|8053 MiB (used per thread|max per alloc|total free)
{ "index" : 5, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 5, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
// gpu: Ellesmere compute units: 36
// memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
{ "index" : 6, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 6, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
// gpu: Ellesmere compute units: 36
// memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
{ "index" : 7, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 7, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
// gpu: Ellesmere compute units: 36
// memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
{ "index" : 8, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 8, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
// gpu: Ellesmere compute units: 36
// memory:1978|4045|3957 MiB (used per thread|max per alloc|total free)
{ "index" : 9, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
{ "index" : 9, "intensity" : 400, "worksize" : 8, "affine_to_cpu" : 0, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true, "interleave" : 40 },
],
/*
* number of rounds per intensity performed to find the best intensity settings
*
* WARNING: experimental option
*
* 0 = disable auto tuning
* 10 or higher = recommended value if you don't already know the best intensity
*/
"auto_tune" : 0,
/*
* Platform index. This will be 0 unless you have different OpenCL platform - eg. AMD and Intel.
*/
"platform_index" : 0,
P.S. On the Graft (cryptonight_v8_reversewaltz) Result is the same, (but little higher)
First set "platform_index" : 1,
Check and update your amd driver up to 18.12+
I had a similar problem on win10_64 with vega64 and 17.5 driver
First set "platform_index" : 1,
Nope, it is wrong, index - it is mean intel or AMD or Nvidia cards in system, for one rig it is always constant!
Check and update your amd driver up to 18.12+ I had a similar problem on win10_64 with vega64 and 17.5 driver
But my platform linux, and no errors in logs when builds, and some newest 18.40 and 18.50 - not works on Debian (when use dpkg -i *.deb some packet sayt that only for Ubuntu.) Maybe between 18.20-18.30 i can try. And one interested question which latest version works under debian (no locks in deb packets) ??
My intel cpu (used for mining) contains GPU (not used for mining) and index=1 work is correctly on win10_64 with vega64, but index=0\
I don't understand debian, but in any case try the latest available driver.
I Checked and as i say, if index=1 - then my miner not founded AMD cards. It is can't be as part of solutions.
At last i solve it! 1) I know that driver where name is contain "ubuntu" can't install some deb packets - and finally impossible to install next driver: amdgpu-pro-18.50-721419-ubuntu-18.04.tar.xz amdgpu-pro-18.50-721418-ubuntu-16.04.tar.xz amdgpu-pro-18.40-697810-ubuntu-18.04.tar.xz amdgpu-pro-18.40-673869-ubuntu-16.04.tar.xz
I'm not sure about amdgpu-pro-18.30-641594.tar.xz
2) I know that core 4.18 not works. Also kernel > 4.15 not works with ROC driver.
And finally i found work mix with kernel 4.9.8 with driver amdgpu-pro-17.50-511655.tar.xz Use it:
wget -O amdgpu-pro-17.50-511655.tar.xz --referer=http://support.amd.com www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.50-511655.tar.xz
Before fork of Monero to new cryptonight_r - all will be fine, with latest v2.10 on master branch After 1788000 block - also will fine, but after some times miner not normal works. (
I work with https://www.supportxmr.com pool.
I know that at this moment all was resolved with display result in reports - https://github.com/fireice-uk/xmr-stak/issues/1976 But now behaviour like before, see screen below.
But for now - i see that same config of my cards not worked, i also delete amd.txt and it was recreated, and it the same. But with slow result - between 0 and 100.
No any updates was in made in OS. clinfo, amdcovc, amdmeminfo, ohgodatool - get normal result. I Not understand what is wrong?