JayDDee / cpuminer-opt

Optimized multi algo CPU miner
Other
773 stars 545 forks source link

Extend Windows binaries list #259

Closed YetAnotherRussian closed 4 years ago

YetAnotherRussian commented 4 years ago

lyra2z330 fallbacks to SSE42 when there is no AVX2 some algos may use SHA even if there is no AVX/AVX2 yescrypt/yespower uses XOP as I see (_mm_roti_epi32 in salsa20 section) AES-NI does not depend on AVX/AVX2 (hardware side), but these cpus suffer a lot because of software AES

1) SHA without AVX/AVX2 - Jxxxx series Celerons/Pentiums I suggest -march=goldmont (as goldmont-plus brings nothing new for mining) 2) SSE42 without AES and without AVX/AVX2 - older Pentium/Celeron CPUs like Gxx I suggest -march=nehalem (not a native, but suits well) 3) SSE42 with AES but without AVX/AVX2 - Pentium/Celeron CPUs incl. new ones like G5600 I suggest -march=westmere (not a native, but suits well) 4) XOP with AVX (no AVX2) - all AMD FX and AMD Axx-xxxx APUs I suggest -march=bdver2 (bdver1 is a very rare case and should be slow as heck, so SSE2 build should fit it)

I can cover all the 4 build types with testing (I have CPUs) if you proceed. I use build types from 1 and 4 and every day, and sometimes 2/3.

1) Slow CPUs, may have passive cooling - bad for mining, but people may try 2) Good per-core performance, but no HT and only 2 cores (may be useless) 3) Good CPUs 4) Good CPUs

Feel free to skip some build types. The most usable are 3,4 I think.

JayDDee commented 4 years ago

Before I address any technical issues please consider the following points:

  1. "algo features" is an artificial specification, set manually by me for each algo only when there is specific coding for the algo.

  2. I'm not interested in rare or niche cases like Goldmont or XOP, or budget CPUs or APUs.

  3. Lyra2z330 reporting SSE4.2 support is an error. It should be SSE2 and will be corrected.

  4. There is very little SSE4.2 code anywhere and no measurable performance difference. That's why I stopped reporting it as an algo feature and building binaries for Nehalem.

  5. There is no AVX "algo feature" because there is no AVX targetted code anywhere. AVX is just a compiler optimization that doubles the number of vector registers. As long as the CPU and build support it, the additional registers will be used regardless of the algo.

  6. All architectures are still available for anyone to compile themselves.

  7. I already build a Windows binary for Westmere (cpuminer-aes-sse42.exe).

  8. When Intel finally releases Icelake I will be looking to remove an existing build to keep the number manageable. Any suggestions? :)

I suggest you review the relevant section of the console logs wiki page as well as the code in simd-utils for a better understanding of how and when features are used.

https://github.com/JayDDee/cpuminer-opt/wiki/Console-Logs

https://github.com/JayDDee/cpuminer-opt/tree/master/simd-utils

.

JayDDee commented 4 years ago

Withe the variety of CPUs you have you should setup your own compile environment. Any serious miner should either use Linux or be able to compile their own Windows binaries. You could then test various compile options to compare performance.

As far as XOP goes, AMD didn't bother to include it in Ryzen. I assume it wasn't a very efficient implementtaion and provided little improvement.

I really want to keep the number of binaries to a minimum. In addition to Icelake, AMD will likley improve AVX2 and/or include AVX512 at some point complicating things even more. I may end up dropping the SSE2 binary.

JayDDee commented 4 years ago

I guess there's nothing more to add, closing.