fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.79k forks source link

CN_GPU algo does not compile with: Clang-3.5.0; ICC 18+; #2272

Open Spudz76 opened 5 years ago

Spudz76 commented 5 years ago

Plz test CN_GPU in CPU-only mode with compilers that used to work great (Clang-3.5, Intel ICC 18 and 19) so that the code compiles again. Ideally the code wouldn't even be glanced at by anything when there are no GPU backends enabled (it's a GPU algo only after all?)

Compiling on ancient Gentoo without gcc5 available, and also without newer than clang-3.5 - can/will not update anything too much cross-dependency and stuff.

It compiled and ran great until CN_GPU came in, now Clang-3.5 hates everything about it (pragmas, the code in general, etc), but it's ironic since I have disabled everything except CPU - it tries to include it for validations it seems even though there is no way it would ever need to validate that algo, ever, ever (unless at least one GPU backend is enabled @ CMake)

Also it broke near completely on Intel ICC at the same time, when it does work it takes about a datacenter of RAM to complete without failing (and like an hour? wtf). It complains and ignores pragmas, too.

Clang 3.7 works again and so does anything newer, along with any GCC 5,6,7,8

I had commented out everything CN_GPU which worked on 2.8.x but 2.9.0 gets super angry, although I was pretty brute force about negating things and probably mismatched the algo indexes, along with deleting the CN_GPU code completely so the globbing (bad mojo) doesn't pick it up. It fails with an algo startup test now but it doesn't bother saying which one. Probably CN-R since it is above the index I ripped out.

CN-R did compile fine on Clang-3.5 no warnings even.

Spudz76 commented 5 years ago

Considered some more, since GPU backends won't even compile and work with any of these alternate compilers, there is no point in fixing the code, but instead it should be easily disabled at compile time especially automatically when there are no GPU backends compiled.

psychocrypt commented 5 years ago

clan 3.5 is nearly 100 years old. Please test clang 4.X. I am not sure if clang 3.5 already support full C++11, this is a requirement.

Considered some more, since GPU backends won't even compile and work with any of these alternate

I do net get your point. Why should the GPU backend not compile?

Spudz76 commented 5 years ago

CN_GPU is mined on GPU only CPU-Only compilation via CMake options diasbling all GPU support, still includes CN_GPU code that it will literally never use

With the useless code commented out it compiles and runs awesome still, but it should remove it when CPU-only as there can not be any CN_GPU mining in CPU-only compilation. You don't have to check results from these nonexistent GPUs, right? So why's the code there then other than to block me from using Clang-3.5

CN-R compiled fine but until I rip CN_GPU out more properly I don't know if it actually works valid results yet, or speeds.

psychocrypt commented 5 years ago

It could still be that someone like to miner cn-gpu on cpu or xeon phi. I can add a check if gpu backends are enabled to the auto suggestion for cpus. because by default i comment out the cpu config. but in the case of where gpu is disabled I should keep the config in.

Spudz76 commented 5 years ago

I will work on a patch to disable CN_GPU completely via CMake option.

At least then it will compile again on old and crap CPU's and OS's, and where also I don't feel like awaiting a compile of unreachable code (probably deleted by the linker too, but compiler doesn't get that far)

psychocrypt commented 5 years ago

please do not invest time into it. It will be a nightmare to maintain and introduce only not readable if defines. It is more important to understand why it is not compiling instead of disabling it. please open an issue with the error message.

Spudz76 commented 5 years ago

Yes it is very very very mixed into things, I tried, and gave up.

/usr/src/xmr-stak/xmrstak/backend/cpu/crypto/cn_gpu_avx.cpp:93:23: error: use of undeclared identifier '_mm256_bslli_epi128'
                r = _mm256_or_si256(_mm256_bslli_epi128(r, 16 - rot), _mm256_bsrli_epi128(r, rot));
                                    ^
/usr/src/xmr-stak/xmrstak/backend/cpu/crypto/cn_gpu_avx.cpp:93:57: error: use of undeclared identifier '_mm256_bsrli_epi128'
                r = _mm256_or_si256(_mm256_bslli_epi128(r, 16 - rot), _mm256_bsrli_epi128(r, rot));
                                                                      ^
2 errors generated.
Spudz76 commented 5 years ago

Haha that was really stupid, found where LLVM crew fixed it and it's only an alias

Add this to xmrstak/backend/cpu/crypto/cn_gpu_avx.cpp wherever you prefer (I put it right under the #pragma GCC target ("avx2") at top) and then it works fine

#ifndef _mm256_bslli_epi128
#define _mm256_bslli_epi128(a, count) _mm256_slli_si256((a), (count))
#endif
#ifndef _mm256_bsrli_epi128
#define _mm256_bsrli_epi128(a, count) _mm256_srli_si256((a), (count))
#endif
Spudz76 commented 5 years ago

EDIT: nevermind I still had Debug build enabled, haha - works perfect with the above

However, lost a ton of speed, but wouldn't cryptonight_v8 be unchanged?

2.8.3 binary still runs at 68H/s (17 per thread) while same configs on 2.9.0 runs 20H/s (5 per thread). glad I didn't overwrite the old executable and the fork isn't happening yet.

Spudz76 commented 5 years ago

Whipped up #2274 including the other patch I have been using forever on these old things (and doesn't hurt anyone that does have HUGETLB). This avoids having to use full slow memory mode even without the proper kernel options (and in some cases unable to change / or there is 942 days uptime and no way I'm rebooting it until it crashes).