fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.79k forks source link

4gb vs 8gb card hashrate on cryptonight-heavy #1352

Closed heavyarms2112 closed 6 years ago

heavyarms2112 commented 6 years ago

Algo - cryptonight-heavy

8 x RX570 4GB 6 x RX580 4GB

All cards hash around 750-780 on different intensities whereas I read that 8gb cards do 1000-1100 on dual threads config.

Is this an algo specific to favor 8gb over 4gb? If not what would be the intensity, worksize for 4gb cards.

I have tried the following with work_size of 8 for all tests, single thread: intensities 864,896,920, 960 dual thread: intensities 448, 464, 472

The best results were on 960 single thread and best in dual thread was 472. Cryptonight algo used to work fine with 896 intensities dual threads. Any recommendations?

Spudz76 commented 6 years ago

Heavy uses a 4M block rather than normal 2M block, like how Lite uses a 1M block and works better on old 1GB cards or CPU with crap cache size.

Somewhat makes sense it should prefer something with twice the space since it takes twice the space?

Also the intensity will be around half the normal CN intensity because the double size, so what you tested and resulted are sensible.

heavyarms2112 commented 6 years ago

@Spudz76 Agreed. However, I am surprised how 8gb cards got a boost in hashrate over normal cryptonight algo. Or perhaps the people who posted 1000-1100 H/s have no idea on cryptonight-heavy and were actually hashing on normal cryptonight (there are pools that still show shares accepted for Sumokoin on cryptonight algo)

fireice-uk commented 6 years ago

@heavyarms2112 That's correct. CN heavy is fastest on 8gb nvidia cards.

SigiSinatra commented 6 years ago

My 580s 8gb work same on both v7 and heavy...1040~1100 h/s (bios mod + overdriventool) 480s 4gb ~ 850 h/s on both v7 and heavy (bios mod + overdriventool) 570s 8gb ~ 960 h/s on both v7 and heavy (bios mod + overdriventool) Only Vegas have "hashdrop" on heavy. Vega 64 and Vega 56 (with 64 bios )~ v7 2040 while on heavy 1400

stna1981 commented 6 years ago

Any numbers for GTX 1060 3GB? Found a config now that makes 470 H/s, before I had 530 H/s (with RAM OC)...

heavyarms2112 commented 6 years ago

@fireice-uk why is Vega with 8gb affected then with heavy and not RX570/RX580 8gb?

yegmine commented 6 years ago

Any numbers for GTX 1060 3GB?

With default settings, I'm only getting 132 h/s with my EVGA 1060 3gb at default settings, but 110 h/s with the 1030 2gb.... WTF? Awesome!? 9.2 h/s C2D 4M L2 (miner only found/using 1 core)

And the cpuminer is only using 1 thread on win10

on win7 also with new xmr-stak

104h/s mining on AMD HD 6950 2GB 24.4 hash on same C2D 4mb L2 (Miner found both cores)

what config settings are you using for your Nvidia GT1060 3gb, @stna1981 ?

stna1981 commented 6 years ago

Try with 4 threads, 144 blocks, bfactor 8

JerichoJones commented 6 years ago

How-TO and tuning your hardware is not within the scope of Github support and will most likely be ignored. It is just not possible for us to know how to tune every model/version of a component.

How-TO and tuning type questions are better suited to forums such as REDDIT/r/moneromining. There is a much larger audience of people that have probably already been through and resolved the same issue.

Make sure you have reviewed the documentation on the Github as well as running:

    xmr-stak --help

^^ This will answer most How-TO type questions. ^^


yegmine commented 6 years ago

@stna1981 and for anyone reading this thread via search engine:

My 1060 3gb is getting 300 with these settings

// gpu: GeForce GTX 1060 3GB architecture: 61 // memory: 2492/3072 MiB // smx: 9 { "index" : 0, "threads" : 4, "blocks" : 96, "bfactor" : 6, "bsleep" : 12, "affine_to_cpu" : 1, "sync_mode" : 3,

GrannyCryptomaster commented 6 years ago

Wow! So many miners with wrong configs! nVidia and xmrstak/xmrig are the simplest things to config in mining business. I ran all the main algos, but I love CN and CN-heavy, because of the clear and simple way to config and mine with maximum performance. Check my 2 posts in xmrig section for best config. The CUDA mining of CN can be sumarised very simple. CN uses 2MB per thread, CNH uses 4MB. For maximum efficiency, you need total no. of threads=no. of cores, if video memory alows it. No. of cores = 32 smx 4 for every gtx gpu. So... for threads and blocks, the only value YOU DON'T need to change and try is BLOCKS. BLOCKS MUST BE SMX 4!!! Not SMX 3 like many recomends, including devs of miners. Now you have your blocks, you have your memory diplayed at xmr-stak start. Calculate threads=memory/blocks /2 or 4 for CN or CNH. From my simple calculations, you will see that for CN, threads are always 32 for all gtx 10 series. For CNH, because of the greater memory need, you must lower threads untill you don't get errors (aka total threads (TxB) occupy the disponible memory with 4MB chuncks each). Devs, you can use this in miners from start. You won't get errors. See my post below:

https://github.com/xmrig/xmrig-nvidia/issues/166

https://github.com/xmrig/xmrig-nvidia/issues/100

The other very wrong values are bfactor and bsleep. DON'T USE in windows the so called "performance values" 6 * 12. YOU NEED bfactor 8 and bsleep 100. No matter you use 1 or 6 cards, or you have the monitor plugged in 1 of them or not. Use those for all. All those GTX 1060 3GB love them. I tested them! I have the same cards!

psychocrypt commented 6 years ago

we are not saying you should use 3x SMX this the the default setting which runs good on the most systems. For the optimal performance you need to hand tune the miner.

stna1981 commented 6 years ago

Especially because it is not true... I tried with 32 threads, but got better results with 8 resp. 4 (for CNH) for my GTX 1060s...

GrannyCryptomaster commented 6 years ago

480 H/S with hynix, mem at 4303 and core at 2050 in CNH. It sounds wrong?

stna1981 commented 6 years ago

For my GTX 1060 3GB, with 36 blocks (9 SMX x 4, as you suggest), I can use a max. of 17 threads. With your settings of 4303 / 2050 (Samsung) I get about 487 H/s. With 4 threads and 144 blocks, I get 492 H/s :-)

GrannyCryptomaster commented 6 years ago

Not so big diff, but it's something. Damn, that Samsung mem is working good. I guess Asus cards? My EVGAs manage between 470-480. Anyway, thank you for contradict me. Finnaly I get some feedback after many weeks. I suggested the "best" settings from calculcations only. I just have 1060 GTXs. And it seems that they are near the top, and without spending days to test every combination possible. It seems that very few gives their best configs in this gold rush.

stna1981 commented 6 years ago

I have 16 cards, mainly Palit, Gainward and KFA². All Samsung, They do between +590 and +920 on mem (P2 state), quite a big range. So also Samsung is no guarantee for high OC, but they seem to have better timings. But for Hynix, your CNH value is really good, on normal CN the difference was bigger in the past (~5-10%)

GrannyCryptomaster commented 6 years ago

I didn't touch the P states. I tryed once with eth mining, and miner crashed. Never tried it again. In CN I got close to 490 h/s

stna1981 commented 6 years ago

P2 is the nVidia default fur CUDA, you can force P0 for higher clocks, but then you need to lower OC, so I think it makes indeed no sense to mess around with it. For CN, I get about 530-550 H/s, depending on clocks. 490 is a typical result for Hynix. I've fixed all my cards to 800mV as this gets the best Hash/Watt ratio.

GrannyCryptomaster commented 6 years ago

OK. I don't want to enlarge this post anymore, but I must report you my findings. I used XMRig, Windows 10 x64 Pro, CUDA 9.1, Geforce 388.71, Afterburner. 6 GTX 1060 3GB Hynix. OC: power 75, temp 83, core at 2025 in full load, mem at 4303 (+500). P2 state. Algo CN-Heavy. bfactor x bsleep = 8 x 100 1.Threads x Blocks = 4 x 144; Total Hashrate 2835 after 5 min, individual hr 46x - 47x. 2.Threads x Blocks = 8 x 72; Total Hashrate 2844 after 5 min, individual hr 46x - 47x. 3.Threads x Blocks = 16 x 36; Total Hashrate 2853 after 5 min, individual hr 47x - 48x. Now has increased to 2860. I will test xmr-stak and report back in this post. I don't want to compare the 2 miners, just to show the best Threads and Blocks for my rig. Thanks again for replaying.

Update xmr-stak gives same hashrates, so, for my setup, TxB=16x36 and bxb=8x100 gives the best hashrates.