fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.79k forks source link

Hit Or Miss Inconsistent Hash Rates CPU Mining on Windows 10 #1788

Open AlexVPerl opened 6 years ago

AlexVPerl commented 6 years ago

Having a problem with very inconsistent hash rates with this miner. I'm able to sometimes (~3 times out of 10) hit my full hash rate of ~1015 H/s.

After 3 days of trying all possible BIOS, Software & Hardware configurations the problem remains and is not effected by changes. Problem seems completely random.

Sometimes when miner is started I will get ~200H/s, then just by closing the miner and reopening it I'll get ~600 and so on.

There are many similar Issue Reports of this by other users, however the team dismisses them as "Hardware configuration is out of scope of GitHub forum, bla bla bla.."

This is NOT hardware config related issue as ~3 times out of 10 I actually get my full hash rate. So I know that my CPU thread configuration (cpu.txt) is valid.

This seems like a software related issue. Can the team please take responsibility, investigate and address this issue.

Basic information

Spudz76 commented 6 years ago

AMD CPU are definitely unoptimized, especially in the Release binaries (built for generic Intel)

Build directly on a rig with an Opteron and see if it has the same variances, the compiler will do things based on the local CPU model and enable AMD-friendly stuff in addition to just SSE2 and AES. Once compiled on an Opteron, you can copy that binary around to the others (no need for MSVC environment everywhere, compiling on each, etc if they are more or less the same family CPU).

Also low-power-mode is worth testing, stacking 5 work threads and turning off no_prefetch (so that it does prefetch) may be beneficial on AMD as well (usually hurts Intel except some core families gain 7%)

It may come down to how the cache ends up working, whether there is 2MB direct to each core or if it's a big shared pool, or if some cores share a cache region they will slow each other down (similar to Intel and running on the "fake" HT cores, the detriment comes from sharing the cache region not so much from the CPU core being virtual). And then some Intels have "SmartCache" or AMD have huge "L4 cache" but neither of those are guaranteed regions and depend on a larger work stack to predict where to reserve cache (not swap out and garbage collect constantly, getting in the way).

You are correct with system RAM making no difference, all action is in the cache, that is all that matters.

Further complicating things are the internal GPUs which can fight for cache regions, disable them if at all possible. Many of the newer CPUs from both sides are using (wasting?) cache as fast VRAM.

Spudz76 commented 6 years ago

That Opteron should run 8 threads and the interleaving of the affinity indices might end up strange, they share FPUs and caches between pairs of cores.

Check for this HT-Assist thing and disable it. What does CPU-Z say on yours? If that is enabled it wastes 4MB of cache per CPU (so then you can only run 3 threads, and it might still get in the way, and would cause variances).

AlexVPerl commented 6 years ago

Thanks for reply @Spudz76. I looked into "HT Assist" feature and indeed it is present on Opterons 6200 series. And yes Windows reports only 12MB L3 Cache & 16MB L2. However there is no option in my BIOS to disable it. Is this something that can be done in Windows?

AlexVPerl commented 6 years ago

Update:

So to summarize, problem still remains.

Next I will try to compile the codebase on AMD. But the team should consider publishing both Intel & AMD compiled binaries since ~ half of us are using AMD.

Tried to provide as much detail as possible. Any additional input / advice from the team would be very helpful.

Thanks.

AlexVPerl commented 6 years ago

Update:

Just spent some time trying to compile the source code. Was able to get a successful build, but no cigar.. Same inconsistent results as before, was able to get 1000H/s only 1 out of 5 runs.

At this point I have tried all suggestions.

Requesting to escalate this issue with the development team as this seems like a wide spread issue amongst many users.

Spudz76 commented 6 years ago

Core Parking is another AMD-only problem likely getting in your way, I recall it making choppy performance in games (when my friend used to keep trying AMD before they gave up and went Intel Forever).

AMD have always been a special case and pain in the butt, too many goofy tricks to get better magazine benchmarks. Then add hacks so watts aren't ridiculous when not benchmarking (core park, do everything based on power saving and not performance, etc).

There is also a special AMD C Compiler based on Clang that works, I tried it on an old tricore Hammer but it didn't do much (it's for Opterons etc). Also might be Linux only. Also might be snakeoil and/or have no gains in this workload.

From hanging out here for quite a while now, almost everyone is on Intels, it definitely is not half, then add MacOS users who literally can't be on AMDs. Also I don't think any devs have anything but Intel, so "compiling for AMD" would be a guess at best, probably make it run worse. Furthermore the CPU code all comes from one place and that was designed for Intel - nobody ever wrote an AMD optimal algorithm, so there isn't one to use. The results should be identical with all CN-CPU miners, but if you can find one that works good maybe we can dig at their CPU code and see how it differs.

The other thing is mining leans toward hash per watt, so you will run into more optimization on the low TDP high cache type CPUs, none of which are really AMD.

AlexVPerl commented 6 years ago

Thanks again for your reply.

I have been trying to solve this for last 3 days and I found just as many Intel users reporting the same problem, where they have to start the miner 3-4 times + multiple reboots to get top hash rate.

Everything that I tried yields the same inconsistent results, even after successfully hitting 1000H/s, closing the miner and starting right after will often result on lower hash rates 600-800. Then reboot and try your luck at lottery.

If the miner is able to hit 1000H/s even once then run stably for many hours holding the speed, then this is not an AMD architecture problem, nor hardware / BIOS settings.

From what I see this has to do more with the miner itself not getting access to or failing to reserve parts of CPU cache. So everything points to a code related problem.

Spudz76 commented 6 years ago

Also forgot, the Windows build doesn't necessarily enable all local CPU features. I have customized my CMakeLists.txt to add:

    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /Ox /GL /EHsc")
    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /Ox /GL /EHsc")
    set(CMAKE_STATIC_LINKER_FLAGS_RELEASE "${CMAKE_STATIC_LINKER_FLAGS_RELEASE} /LTCG")
    set(CMAKE_SHARED_LINKER_FLAGS_RELEASE "${CMAKE_SHARED_LINKER_FLAGS_RELEASE} /LTCG /INCREMENTAL:NO /OPT:REF")
    set(CMAKE_EXE_LINKER_FLAGS_RELEASE "${CMAKE_EXE_LINKER_FLAGS_RELEASE} /LTCG /INCREMENTAL:NO /OPT:REF")

I spend most of my time in Linux, which does compile for the local CPU. /Ox is similar to -march=native but still not as good