fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.79k forks source link

i7-8700k inconsistent results #1649

Open KAOS191 opened 6 years ago

KAOS191 commented 6 years ago

Hi, I am struggling with inconsistency. If I use nicehash I always get for cryptonight_v7 ~450h/s. When I try to run xmr-stack 2.4.4 by myself to mine Monero or Electroneum, every time I get different values. I usually get ~80h/s on some cores, and a ~40h/s on the rest. Huge pages enabled and activated in the system. It happened few times, that I started miner on already fully loaded CPU and it run ~480h/s (after switching off the heavy app), so I know it IS possible.

i7-8700k - 4.4Ghz (stock) 8GB RAM 2400MHz 6x 1050Ti GPU (those are working great btw) Win 10

"cpu_threads_conf" : [ { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 0 }, { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 2 }, { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 4 }, { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 6 }, { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 8 }, { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 10 },

], hs cfg hs2

Spudz76 commented 6 years ago

I get the same skew with core performance I assumed it was just something with how Intel is doing their Neat Tricks (tm) inside various CPU cores.

Turbo and SmartCache and whatever other buzztech going on behind the scenes automagically.

However, nice and even on a E5-2620 2GHz (15MB SmartCache so 7 threads):

HASHRATE REPORT - CPU
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |   36.5 |   36.5 |   36.6 |  1 |   42.0 |   42.1 |   42.1 |
|  2 |   40.6 |   40.6 |   40.7 |  3 |   40.7 |   40.8 |   40.8 |
|  4 |   42.2 |   42.2 |   42.3 |  5 |   42.0 |   42.0 |   42.1 |
|  6 |   36.5 |   36.5 |   36.6 |
Totals (CPU):   280.5  280.8  281.0 H/s

But that's comparing a E5 Sandy Bridge (old tricks/less tricks/no tricks?) with your i7 Coffee Lake (new tricks?)

baldpope commented 6 years ago

That's interest guys, I'm seeing similar inconsistent results. Given host was getting (reported) 950H/s, experienced power outage, now I only get about 800H/s

Dual CPU Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz

cat cpu.txt

"cpu_threads_conf"` :
[
    { "low_power_mode" : true, "no_prefetch" : true, "affine_to_cpu" : 0 },
    { "low_power_mode" : true, "no_prefetch" : true, "affine_to_cpu" : 1 },
    { "low_power_mode" : true, "no_prefetch" : true, "affine_to_cpu" : 2 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 3 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 4 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 5 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 6 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 7 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 8 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 9 },
    { "low_power_mode" : true, "no_prefetch" : true, "affine_to_cpu" : 10 },
    { "low_power_mode" : true, "no_prefetch" : true, "affine_to_cpu" : 11 },
    { "low_power_mode" : true, "no_prefetch" : true, "affine_to_cpu" : 12 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 13 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 14 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 15 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 16 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 17 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 18 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 19 },
],

HASHRATE REPORT - CPU | ID | 10s | 60s | 15m | ID | 10s | 60s | 15m | | 0 | 51.1 | 51.1 | 50.9 | 1 | 54.5 | 54.5 | 54.2 | | 2 | 49.2 | 49.2 | 49.0 | 3 | 33.9 | 33.9 | 33.8 | | 4 | 33.2 | 33.2 | 33.1 | 5 | 33.3 | 33.3 | 33.0 | | 6 | 33.1 | 33.1 | 33.0 | 7 | 34.5 | 34.5 | 34.4 | | 8 | 28.8 | 28.8 | 28.8 | 9 | 37.1 | 37.1 | 37.0 | | 10 | 52.3 | 52.3 | 52.2 | 11 | 57.6 | 57.6 | 57.5 | | 12 | 51.5 | 51.5 | 51.4 | 13 | 33.8 | 33.8 | 33.7 | | 14 | 35.9 | 35.9 | 35.9 | 15 | 35.0 | 35.0 | 35.1 | | 16 | 35.5 | 35.5 | 35.5 | 17 | 35.1 | 35.1 | 35.0 | | 18 | 37.4 | 37.4 | 37.4 | 19 | 37.4 | 37.4 | 37.4 | Totals (CPU): 800.0 800.1 798.4 H/s

Totals (ALL): 800.0 800.1 798.4 H/s Highest: 800.3 H/s

I disabled hyper-threading in BIOS. Full detail of /proc/cpuinfo https://pastebin.com/ZAWMXdkA

If CPU0 can get 51, shouldn't they all be able to get 51H/s ?

With disabling HT, my understanding is the CPUs listed in /proc/cpuinfo are the real cores available, so numbering 1 through 9 are are real cores on physical CPU 1 and 10 through 19 are real cores on physical CPU 2.

Spudz76 commented 6 years ago

The ones getting ~52H are the ones you've set to "low_power_mode":true whilst the others are false, how fast is it with all set to true? You may get better results in total with low_power_mode:5 and no_prefetch:true

There is a PR #1604 that allows for low_power_mode:100 which works well for Broadwells. Something about the SmartCache goofs with things internally... it isn't necessarily 2MB -> core directly mapped, it:

Intel® Smart Cache refers to the architecture that allows all cores to dynamically share access to the last level cache.

So that says whatever the CPU wants to do given current workload versus cache, it will do, magically. So some cores have to await cache usage sometimes, slowing them down, or whatever. They don't say exactly how it works / trade secret or whatever. But anyway providing it a huge stack of 100 tasks per core forces it to optimize the workload and cores to cache better (it can't get a good idea of what to optimize with such a small stack as 1-5 threads per core).

The PR was not accepted and does not apply cleanly anymore. I use it on a few SmartCache (not Broadwell, but it has been tested on Broadwell...) where it worked for a +30H total boost so I am very interested in updating it.

uvtzxpm commented 6 years ago

I have the same problem. On two different systems actually, both with an i7-8700k.

I have an i7-8700k (overclocked to 4.9 GHz). I'm using xmr-stak 2.4.3 26a5d65f, but I don't think the version matters since this has been happening for a long time and with many different versions. I'm only doing CPU mining with xmr-stak.

Whenever I restart my computer, I have to "mess around" with xmr-stak.

That involves restarting xmr-stak a lot, or restarting my computer again, or switching my cpu.txt config around (changing the affine_to_cpu numbers, it feels like it helps)... until I get the "fast hash rate".

Right now, I have it running with this config:

"cpu_threads_conf" :
[
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 1 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 3 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 5 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 7 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 9 },
    { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 11 },
],

Which is giving me the "fast hash rate" (even with a bunch of junk running in the background):

-----------------------------------------------------------------
HASHRATE REPORT - CPU
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |   71.7 |   72.4 |   (na) |  1 |   75.8 |   76.5 |   (na) |
|  2 |   79.3 |   79.9 |   (na) |  3 |   79.9 |   80.5 |   (na) |
|  4 |   79.3 |   79.9 |   (na) |  5 |   74.0 |   74.7 |   (na) |
Totals (CPU):   460.1  463.9    0.0 H/s
-----------------------------------------------------------------
Totals (ALL):    460.1  463.9    0.0 H/s
Highest:   472.7 H/s
-----------------------------------------------------------------

It takes ages to get it in "fast mode", sometimes I just give up and let it go in "slow mode" which is about 350 H/s, similar to OP.

xmrig has exactly the same problem, here's a long issue about it: https://github.com/xmrig/xmrig/issues/207

Here's a comment by the author of xmrig: https://github.com/xmrig/xmrig/issues/207#issuecomment-373213020

@Spudz76 Any ideas?

baldpope commented 6 years ago

If I set them all to true, it actually slows all cores down. I also added the mentioned 5:true settings, but did not have any positive affect.

Interestingly similar, I also see 'fast' and 'slow' modes between runs on the same architecture (same host even). On one such host, I was once able to get about 950H/s but subsequent runs only reach about 800H/s (as mentioned above). I'm compiling from source, running that config as shared above, but getting continuing inconsistent results.

Across the four hosts I'm attempting to run, missing 150H per host is the equivalent of (atleast) a RX 480, which is very unfortunate to be missing out on that hash.

I'm happy to pull updated source, compile and run if there is interest in testing and finding a solution.

KAOS191 commented 6 years ago

I messed up with Nicehash today and somehow Nicehash (with the same config like the one at the very beginning of this post) is always getting 420+ in total. There must be something to do from the point of mining application to force CPU to go on full load...