JayDDee / cpuminer-opt

Optimized multi algo CPU miner
Other
773 stars 544 forks source link

Cannot compile with O0(debug mode) #389

Closed liorwavebl closed 1 year ago

liorwavebl commented 1 year ago

I'm trying to compile the latest release v.3.21.1 and v3.21 and got the same error

mv -f algo/fugue/.deps/cpuminer-sph_fugue.Tpo algo/fugue/.deps/cpuminer-sph_fugue.Po
gcc -DHAVE_CONFIG_H -I.  -Iyes/include -fno-strict-aliasing -I./compat/jansson -I. -Iyes/include -Wno-pointer-sign -Wno-pointer-to-int-cast   -O0 -march=native -mtune=native  -Iyes/include -MT algo/hamsi/cpuminer-sph_hamsi.o -MD -MP -MF algo/hamsi/.deps/cpuminer-sph_hamsi.Tpo -c -o algo/hamsi/cpuminer-sph_hamsi.o `test -f 'algo/hamsi/sph_hamsi.c' || echo './'`algo/hamsi/sph_hamsi.c
In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:39,
                 from /usr/lib/gcc/x86_64-linux-gnu/11/include/x86intrin.h:32,
                 from algo/fugue/fugue-aesni.c:20:
./simd-utils/simd-128.h: In function ‘mm128_mask_32’:
./simd-utils/simd-128.h:167:22: error: the last argument must be an 8-bit immediate
  167 |    _mm_castps_si128( _mm_insert_ps( _mm_castsi128_ps( v1 ), \
      |                      ^~~~~~~~~~~~~
./simd-utils/simd-128.h:183:12: note: in expansion of macro ‘mm128_xim_32’
  183 | {   return mm128_xim_32( v, v, m ); }
      |            ^~~~~~~~~~~~
make[2]: *** [Makefile:3796: algo/fugue/cpuminer-fugue-aesni.o] Error 1
make[2]: *** Waiting for unfinished jobs...

The makefile was created by: CFLAGS="-O0 -march=native -mtune=native" CXXFLAGS="$CFLAGS -std=c++17 -Wno-ignored-attributes -mavx2 -msse4.2" ./configure --with-curl

OS: Ubuntu 22.04.1 GCC version: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

JayDDee commented 1 year ago

This error happens because the code was called in a loop that's usually unrolled by the compiler's optimizer. The solution is to use -O3, as recommended, or manually add the option to unroll loops. Unrolling the loop changes the variable to an immediate.

Your configure line is strange with different flags for c & c++. There's hardly any c++ code in cpuminer-opt and what does exist doesn't need different flags.

liorwavebl commented 1 year ago

I re-run the configure script as written at the documentation and it compiles successfully. When I'm running it in debug mode I'm still getting the same error after i added the option fno-unroll-loops to the configure script:

CFLAGS=" -O0 -fno-unroll-loops -msse4" ./configure --with-curl

The reason for working with debug mode is that I'm receiving this error.

Stacktrace output from GDB:

Thread 2 "cpuminer" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6883600 (LWP 125114)]
0x000055555555e129 in gbt_work_decode ()
(gdb) bt
#0  0x000055555555e129 in gbt_work_decode ()
#1  0x000055555556363a in workio_thread ()
#2  0x00007ffff7577b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#3  0x00007ffff7609a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 
JayDDee commented 1 year ago

That's not a proper build. That's probably why it crashed. Use a proper architecture.

liorwavebl commented 1 year ago

I'm sorry but even when I'm running build.sh the error it's still the same. This the original line from the script: CFLAGS="-O3 -march=native -Wall" ./configure --with-curl

This is my new line: CFLAGS="-O0 -march=native -fno-unroll-loops -Wall" ./configure --with-curl

JayDDee commented 1 year ago

If the segfault occurs when built with march=native -O3, there is a problem so a lot more info is required. I'll need the full story: CPU, OS, coin, command line, debug (-D) and protocol (-P) data, etc. I'm not able to test gbt properly so I'll need a lot of help with testing. Is this an existing coin or are testing something new?

Edit: Take a look at #379, looks very similar. What's your CPU?

Edit: here's the important part of that issue: https://github.com/JayDDee/cpuminer-opt/issues/379#issuecomment-1229361014

liorwavebl commented 1 year ago

The comment solved my problem. AFAIU I'm missing the AVX2 optimizations? My process is Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz and I'm running it on Ubuntu 22.04.1.

Edit: The crash is in this line, as same as mentioned at the this https://github.com/JayDDee/cpuminer-opt/issues/379#issuecomment-1229353813.

If I'm adding this line at the loop, it's working as expected:

   for ( i = 0; i < 8; i++ ) {
       applog(LOG_ERR, "inside loop");
       work->target[7 - i] = be32dec(target + i);
   }

Edit (2): I read al the thread of #379 and it's the same on my computer.

JayDDee commented 1 year ago

That's very interesting, it's the very same problem. Are you using v3.21.1, it should be fixed. I froced target to be aligned:

uint32_t target[8] __attribute__ ((aligned (32)));

It's ultimately an alignment issue that shouln't happen. The optimizer auto-vectorizes the loop with AVX2 which requires 32 byte data alignment. Preventing auto-vectorization is a workaround that will prevent the fault, as adding a applog call did in your test. This seems like a compiler bug. If it's going to auto-vectorize non-vector source code it should ensure the data is properly aligned.

However, if it faults with v3.21.1, with enforced 32 bit alignment, I've got a bigger problem.

Regarding the compile problem with code that is dependent on compiler loop unrolling, I'll take a look but won't promise anything. This is an optimized miner so compiler optimizing can be considered as a requirement.

Edit: Both issues were when compiling with GCC-11.2, you could try a different compiler to see if it makes a difference, GCC-12 is available for Ubuntu 22.04.

JayDDee commented 1 year ago

I investigated the compile error with -O0 and it was not related to loop unrolling but compile time constants. The use of the offending argument was the result of an expression of 2 immediate values that should have been evaluated at compile time. I'm surprised evaluating constant expressions at compile time would be considered an optimization. Builds using -O0 will not be supported.

JayDDee commented 1 year ago

Are there anymore test results forthcoming? If not there's no point keeping this open.

JayDDee commented 1 year ago

The issue seems to have been abandoned. Compiling with -O0 is a non-issue, the segfault in GBT using AVX2 will remain a mystery for now.