Closed liorwavebl closed 1 year ago
This error happens because the code was called in a loop that's usually unrolled by the compiler's optimizer. The solution is to use -O3, as recommended, or manually add the option to unroll loops. Unrolling the loop changes the variable to an immediate.
Your configure line is strange with different flags for c & c++. There's hardly any c++ code in cpuminer-opt and what does exist doesn't need different flags.
I re-run the configure script as written at the documentation and it compiles successfully. When I'm running it in debug mode I'm still getting the same error after i added the option fno-unroll-loops
to the configure script:
CFLAGS=" -O0 -fno-unroll-loops -msse4" ./configure --with-curl
The reason for working with debug mode is that I'm receiving this error.
Stacktrace output from GDB:
Thread 2 "cpuminer" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6883600 (LWP 125114)]
0x000055555555e129 in gbt_work_decode ()
(gdb) bt
#0 0x000055555555e129 in gbt_work_decode ()
#1 0x000055555556363a in workio_thread ()
#2 0x00007ffff7577b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#3 0x00007ffff7609a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb)
That's not a proper build. That's probably why it crashed. Use a proper architecture.
I'm sorry but even when I'm running build.sh
the error it's still the same.
This the original line from the script:
CFLAGS="-O3 -march=native -Wall" ./configure --with-curl
This is my new line:
CFLAGS="-O0 -march=native -fno-unroll-loops -Wall" ./configure --with-curl
If the segfault occurs when built with march=native -O3, there is a problem so a lot more info is required. I'll need the full story: CPU, OS, coin, command line, debug (-D) and protocol (-P) data, etc. I'm not able to test gbt properly so I'll need a lot of help with testing. Is this an existing coin or are testing something new?
Edit: Take a look at #379, looks very similar. What's your CPU?
Edit: here's the important part of that issue: https://github.com/JayDDee/cpuminer-opt/issues/379#issuecomment-1229361014
The comment solved my problem. AFAIU I'm missing the AVX2 optimizations? My process is Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz and I'm running it on Ubuntu 22.04.1.
Edit: The crash is in this line, as same as mentioned at the this https://github.com/JayDDee/cpuminer-opt/issues/379#issuecomment-1229353813.
If I'm adding this line at the loop, it's working as expected:
for ( i = 0; i < 8; i++ ) {
applog(LOG_ERR, "inside loop");
work->target[7 - i] = be32dec(target + i);
}
Edit (2): I read al the thread of #379 and it's the same on my computer.
That's very interesting, it's the very same problem. Are you using v3.21.1, it should be fixed. I froced target to be aligned:
uint32_t target[8] __attribute__ ((aligned (32)));
It's ultimately an alignment issue that shouln't happen. The optimizer auto-vectorizes the loop with AVX2 which requires 32 byte data alignment. Preventing auto-vectorization is a workaround that will prevent the fault, as adding a applog call did in your test. This seems like a compiler bug. If it's going to auto-vectorize non-vector source code it should ensure the data is properly aligned.
However, if it faults with v3.21.1, with enforced 32 bit alignment, I've got a bigger problem.
Regarding the compile problem with code that is dependent on compiler loop unrolling, I'll take a look but won't promise anything. This is an optimized miner so compiler optimizing can be considered as a requirement.
Edit: Both issues were when compiling with GCC-11.2, you could try a different compiler to see if it makes a difference, GCC-12 is available for Ubuntu 22.04.
I investigated the compile error with -O0 and it was not related to loop unrolling but compile time constants.
The use of the offending argument was the result of an expression of 2 immediate values that should have been evaluated at compile time. I'm surprised evaluating constant expressions at compile time would be considered an optimization.
Builds using -O0
will not be supported.
Are there anymore test results forthcoming? If not there's no point keeping this open.
The issue seems to have been abandoned. Compiling with -O0 is a non-issue, the segfault in GBT using AVX2 will remain a mystery for now.
I'm trying to compile the latest release v.3.21.1 and v3.21 and got the same error
The makefile was created by:
CFLAGS="-O0 -march=native -mtune=native" CXXFLAGS="$CFLAGS -std=c++17 -Wno-ignored-attributes -mavx2 -msse4.2" ./configure --with-curl
OS: Ubuntu 22.04.1 GCC version: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0