fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.06k stars 1.79k forks source link

GPU R9295x2 Slows #2047

Open Maui1 opened 6 years ago

Maui1 commented 6 years ago

hello i have 4card R9 295X2 and i have this setting

"gpu_threads_conf" : [ // gpu: Hawaii memory:3653 // compute units: 44 { "index" : 0, "intensity" : 946, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true }, // gpu: Hawaii memory:3689 // compute units: 44 { "index" : 1, "intensity" : 946, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true }, // gpu: Hawaii memory:3689 // compute units: 44 { "index" : 2, "intensity" : 946, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true }, // gpu: Hawaii memory:3689 // compute units: 44 { "index" : 3, "intensity" : 946, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true }, // gpu: Hawaii memory:3689 // compute units: 44 { "index" : 4, "intensity" : 946, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true }, // gpu: Hawaii memory:3689 // compute units: 44 { "index" : 5, "intensity" : 946, "worksize" : 8, "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true }, ],

"platform_index" : 0,

i give 3.869 kH this is very slow kh for this card.. can you tell my where is the problem plz

Spudz76 commented 6 years ago

What coin is set in pools.txt?

How do you know it is a slow kh for this card?

For sure, you cannot compare any old results/speeds with the CN2v2 algo, and the new fork is much much much more difficult to older GPUs so whatever is 'slow' could be the new 'fast' for those.

Simaex commented 6 years ago

Please note that you are using only three R9-295X2. Just using the last one may add up two 33% two your hashrate. Each card is actually two GPU on a single board so you need min 8 threads to use (indexes from zero to seven). The next thing to try is to use two threads per GPU but you will need to search for a sweet spot with intencities and maybe other settings. Of course two threads may be slower also but if you are interested in maximum performance you'd like to check all options.

Maui1 commented 6 years ago

are you right now working 3 r9 295X2

Miner loki coin "currency": "cryptonight_heavy",

Simaex..

i test this ..

"gpu_threads_conf" : [ // gpu: Hawaii memory:3653 // compute units: 44 { "index" : 0, "intensity" : 864, "worksize" : 16, "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 18, "unroll" : 8, "comp_mode" : true }, { "index" : 0, "intensity" : 864, "worksize" : 16, "affine_to_cpu" : false, "strided_index" : 2, "mem_chunk" : 18, "unroll" : 8, "comp_mode" : true }, // gpu: Hawaii memo

but with 2 threads i have error

[2018-10-29 22:58:52] : Error CL_MEM_OBJECT_ALLOCATION_FAILURE when calling clEnqueueNDRangeKernel for kernel 0.

the question is there is something I can add and then tease myself

Simaex commented 6 years ago

If you get error try much lower intencity, get GPU working and then gradually increase performance to maximum. For old GPU sometimes weird worksizes like 17 to 23 prove to gain a lot of speed. Do not have a chance to play with 295X2 so no precise recommendations.

Spudz76 commented 6 years ago

CN-Heavy currencies use 4MB scratchpad, you probably are actually out of VRAM due to too much intensity * 4MB

I think you can go about to intensity:448,worksize:16 maximum before running out of memory. Or intensity:440,worksize:22 with worksize based on the compute units count.

Also mem_chunk is an exponent of 2 ... therefore 18 seems excessive and probably wastes allocation space between workunits (which can run allocation beyond VRAM size, same alloc error since that code does not bother to sanity check the request before sending it to malloc/OpenCL routines). Ideal mem_chunk would be the smallest multiple of whatever the GPU enjoys as alignment, or what sizes its memory bus can access/transfer fastest. Also strided_index:2 may not be best depending on VRAM layout and type and speed and etc.

I don't think strided_index:2 worked better (or at all?) on my regular Hawaii cards, but that was a while ago the last time I bothered.

EDIT: also when I ran Hawaii it was regular 2MB CN so I do not know about CN-Heavy other than its twice the scratchpad and older cards tend to suck at it (skinny VRAM busses, slower DDR type, not HBM etc)

Maui1 commented 6 years ago

i try all this "gpu_threads_conf" : [ // gpu: Hawaii memory:3702 // compute units: 44 { "index" : 0, "intensity" : 448, "worksize" : 16, "affine_to_cpu" : false, "strided_index" : 1, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true }, { "index" : 0, "intensity" : 448, "worksize" : 16, "affine_to_cpu" : false, "strided_index" : 1, "mem_chunk" : 2, "unroll" : 8, "comp_mode" : true },

and give

151.17 and 151.7 this for two cores if i try to up more the system i go down :S