ethereum-mining / ethminer

Ethereum miner with OpenCL, CUDA and stratum support
GNU General Public License v3.0
5.97k stars 2.28k forks source link

OpenCL optimisation: VGPR and LDS usage #462

Closed androidi closed 6 years ago

androidi commented 6 years ago

Using CodeXL to profile GPU performance I noticed that VGPR usage is 118 out of 256 on a Vega 64. CodeXL whined that the limiting speed factor is thus the number of used vector variables. LDS usage was also next to none.

After fiddling around with the OCL-kernel for a while, I decided to change the order of two inline functions, namely SHA3_512 and keccak_f1600: https://github.com/ethereum-mining/ethminer/blob/master/libethash-cl/CLMiner_kernel.cl#L212 https://github.com/ethereum-mining/ethminer/blob/master/libethash-cl/CLMiner_kernel.cl#L220

I changed those to static functions and #pragma unroll'd the loops. VGPR usage went from 118 to somewhere around 80, and LDS usage grew from 1 KB to somewhere around 4 KB (sorry I made so many tests I can't remember the exact figures). CodeXL also reported kernel usage to rise from 20% to 30% and the number of active wavefronts rose to next level (from 12 to 16 or something..).

I only have a Vega 64 atm, but I can confirm a rate of about 42+ MH/s from nanopool (mined about 7 days, till 0.05 ETH), using GPU clock at 850 MHz, and mem clock at 1050 MHz. - GPU clock is totally irrelevant here, lower clock just helps to keep air cooled Vega cool (~57 cels GPU, ~77 cels HBM2). According to CodeXL it's the memory bus that stalls, like BIG time.

These optimisations however help to achieve near Claymore's speeds - without the dev fee!

Capsaicin / Adrenaline driver package makes very little difference here.

I'll have a GTX 1080 Ti at hand in a week or so, so I can check what happens with nvidia cards. For CUDA, I need to learn some CUDA first, so it'll take a couple of more days.

"It's more fun to compute."

TaiPhamD commented 6 years ago

any update on this? why was it closed did you patch this into the main line? i have a vega64 and would like the performance improvement as well. Thank you.

androidi commented 6 years ago

I didn't get any echo. Also the speed gains are not yet coherent. Apparent in the CodeXL yes, but would need more testing. I've just been so busy lately...

There's also the interesting thing, that the OpenCL-code in this project is pretty much the same as the OpenCL-code in every other ethereum-miner app. This is obviously another thing that needs some meditation.

For now, the best params for Vega64 are probably something like: --cl-local-work 256 --cl-global-work 32768 --cl-parallel-hash 2 (try "--cl-parallel-hash 4" too, instead of "2")

Those will get you really close (like in 2-3%) to the top hash rates I've been able to achieve. Also remember to clock that GPU down and HBM2 to the max. I can get stable 1075 MHz HBM2 out of my Vega64.

ghost commented 6 years ago

Thanks @androidi Setting --cl-parallel-hash to any other value than default (8) did not work for my Vega64. I was able to increased my rate to about 42MH/s with the two other commands you suggested.

Another question to GPU clock and Memclock settings. I just switched from mining cryptonight to eth and all my settings in OverdriveNTool are tweaked for cryptonight so GPU clock is at 1408 MHz and Memclock is at 1100 MHz. If I read your first post correctly you set these values to 850 MHz for GPU and 1050 MHz for Memclock. Can you share your GPU and Memclock settings for all power states (MHz and mV), please?

androidi commented 6 years ago

I've set my GPU clock as low as possible from WattMan, because eth-mining does not use all the available computing power. OTOH the memory bus is sagging all the time, hence ETH mining is called "memory hard". I've forced my Vega GPU to power STATE 0 and memory to STATE 3. Currently using only 124 W achieving hashrate of 42.5 MH/s.

ghost commented 6 years ago

Thanks. Did you change any power state values (frequency, voltage) or kept the default values? Also, did you change soft power play tables in your registry?

androidi commented 6 years ago

Actually yeah. I've set my memory voltage to 875 mV. Also I've lowered the target temperature to 65 celsius and increased max fan speed to 2600 RPM. All within WattMan. Haven't played with any registry settings, as I'm trying to keep this card in working condition for other work-related OpenCL/GPGPU projects I need it for.

ghost commented 6 years ago

Did some testing with HBM locked at 1100MHz/905mV and different GPU frequency/voltage settings. I use a watt-meter so power consumption is for the entire system. Frequency and voltage are set using overdriveNTool and values are checked using HWiNFO. I've blocked all power states except P7 (GPU) and P3 (HBM) in overdriveNTool.

MHz mV Watt MH/s
1408 925 260 41
1100 905 230 39
1000 905 225 38
900 905 210 36
852 905 190 30

I am currently mining at 1100MHz/905mV (GPU and HBM) and the GPU is at 56C, HBM is at 75C (air-cooled Vega64). Fan speed between 2000-2100 RPM. Hash rate around 40MH/s. Interesting that you are able to get 42.5MH/s with a lower GPU frequency and voltage. Not sure what I am doing wrong but I don't see that a lower GPU frequency doesn't affect hash rate.

androidi commented 6 years ago

Now that is interesting... Okay, I do have my own special build of ethminer in use. I used AMD's CodeXL to check what's going on with the OpenCL kernel, and things were non-optimal. But still! Most of the time was spent on waiting for data from memory.

Parts of my machine are legacy by today's specs: i7-3770K, 32 GB DDR3, Win 10 Pro, 1080 Ti. Ti is handling the monitor (Nanopool says I'm getting ~36 MH/s from the GTX, using Claymore's). So those parts don't explain the "high" hashrate. While I was having this 4K monitor plugged in Vega, I did not notice any difference on the hashrate. With monitor, Ti drops hashrate (again, using Claymore there), but only about 0.1 MH/s.

HWiNFO64 (v 5.72) says

Maybe I do have to publish a version with my changes after all. I just f'cked up the project setup, and I omitted the CUDA part, cause I only had the Vega when I decided check out what's this ethminer thingy... So before I publish, I'll have to clone current ethminer version, make the changes, and so on.

The next few days I'm however too busy. I'm trying to convince a science centre to use Atmel Studio and debug MCU chips directly, instead of trying to build solutions on Arduino, and then thinking those will last for 5 years in visitor use (without crashes, corrupt data and frequent resets). :D