JayDDee / cpuminer-opt

Optimized multi algo CPU miner
Other
773 stars 545 forks source link

strange behavior of hodl on linux/avx2 #267

Closed rplant8 closed 4 years ago

rplant8 commented 4 years ago

I'm trying to statically build a miner on linux. On the hodl algorithm the miner stops with an error: Thread 9 "cpuminer_avx2.l" received signal SIGBUS, Bus error. [Switching to LWP 23562] 0x00000000008ad724 in __memmove_avx_unaligned_erms () (gdb) bt

0 0x00000000008ad724 in __memmove_avx_unaligned_erms ()

1 0x00000000004bb0b7 in memcpy (len=4096, src=0x7ffff7ffa010, __dest=)

at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:34

2 scanhash_hodl_wolf (work=0x7fffa7dddac0, max_nonce=, hashes_done=0x7fffa7ddda80, mythr=0xd795d8)

at algo/hodl/hodl-wolf.c:92

3 0x00000000004076a0 in miner_thread (userdata=) at cpu-miner.c:2321

4 0x0000000000832349 in start_thread (arg=) at pthread_create.c:477

5 0x00000000008bfd63 in clone ()

the error may vary, but still around unaligned Appears only on non-virtual linux/AVX2 boxes on aes, avx, and avx2 static builds. It works on a VM with avx2, and on machines with aes/avx/avx512. it works on windows/macos. the build with dynamic libraries works. tried gcc6 / 8 / 9, -O1/2/3, openssl 1.1.0l, 1.1.1t, 1.1.1u. miner versions 3.8.8.1, 3.12.3, 3.14.2

JayDDee commented 4 years ago

Interesting this hasn't come up before. The location of the fault is a memcpy and neither the source nor the destination have any alignment specification. This should be simple to fix.

hodl-wolf.c:73 CacheEntry Cache[AES_PARALLEL_N] attribute ((aligned (64)));

hodl-gate.c:178 hodl_scratchbuf = (unsigned char*)_mm_malloc( 1 << 30, 64 );

JayDDee commented 4 years ago

Does it work?

rplant8 commented 4 years ago

partially, next error: Thread 7 "cpuminer" received signal SIGSEGV, Segmentation fault. [Switching to LWP 23191] 0x00000000004bc31b in sha512Compute32b_parallel (data=data@entry=0x7ffff5dd38a0, digest=digest@entry=0x7ffff5dd38e0) at /usr/lib/gcc/x86_64-linux-gnu/9/include/smmintrin.h:456 456 /usr/lib/gcc/x86_64-linux-gnu/9/include/smmintrin.h: No such file or directory. (gdb) bt

0 0x00000000004bc31b in sha512Compute32b_parallel (data=data@entry=0x7ffff5dd38a0, digest=digest@entry=0x7ffff5dd38e0)

at /usr/lib/gcc/x86_64-linux-gnu/9/include/smmintrin.h:456

1 0x00000000004b7f88 in GenerateGarbageCore (Garbage=Garbage@entry=0x0, ThreadID=ThreadID@entry=3, ThreadCount=, MidHash=MidHash@entry=0x7ffff5dd3960)

at algo/hodl/hodl-wolf.c:77

2 0x00000000004b864f in GenRandomGarbage (Garbage=0x0, pdata=pdata@entry=0x7ffff5dd3b40, thr_id=3) at algo/hodl/hodl-wolf.c:260

3 0x00000000004b7ab8 in hodl_scanhash (work=0x7ffff5dd3b40, max_nonce=2863311496, hashes_done=0x7ffff5dd3ac0, mythr=0xd534a8) at algo/hodl/hodl-gate.c:159

4 0x00000000004074c4 in miner_thread (userdata=0xd534a8) at cpu-miner.c:2321

5 0x000000000082d0e7 in start_thread (arg=) at pthread_create.c:486

6 0x00000000008a815f in clone ()

JayDDee commented 4 years ago

It's not a simple misalignment issue. Misalignment is somewhat random and we often get lucky when alignment is not guaranteed either by default or by coding.

Segfaults are a different matter. Since you previously mentioned there were other errors it's also a bigger issue.

I have one wild guess, the number of threads. I've always been concerned with hodl and thread count because the algo shares a single 2GB buffer among all threads. The buffer is divided by the number of threads and each thread works only on its own part. Some divisions may result in odd sized blocks that could cause some block boundaries to be misaligned. This misalignemnt can also cause cross block corruption which could result in segfaults.

If you're running with an odd thread count, don't. Keep it to a power of 2.

Do you test on the same machine where it was built. Static builds still rely a lot of the host system. Perhaps an incompatibility.

If you really want to dig into this I suggest no compiler optimization. It makes understanding the backtrace easier. Then perform several tests go get different symptoms, determine the variable(s) with the bad pointer and look for a pattern and trace the variable.

I can help with the analysis but I expect you to take the lead.

rplant8 commented 4 years ago

absolutely correct, does not work with -t 3/6/7, the rest work. I do not think that it is worth fixing, we consider the property of the algorithm) I have just 3 six-core computers and all with avx2 and under Linux

JayDDee commented 4 years ago

I'm glad the problem was found. I was worried it might be more complicated and I wasn't sure it was worth the effort considering the lack of support for the algo.

There isn't really anything to "fix". As you said it's a property of the algo. I can put a warning but testing for a power of 2 isn't trivial.

The forced alignment suggested above will be included in the next release. Even though it hasn't yet been the cause of any problems, it's the right thing to do.

JayDDee commented 4 years ago

Intel's tendency to build CPUs with non-binary core counts is likely to be a problem for Hodl. Even the default thread count is invalid for Hodl it would cause data corruption.

The thread count must be a power of 2 even if it exceeds the number of CPU cores. The number of threads does not affect resource usage for Hodl so it may be a suitable workaround to overload the CPU cores by increasing the number of threads instead of reducing them.

I found a simple test that doesn't rely on power of 2. As a binary number the divisor is only integer-divisible by a power of 2. The following warning will be included in the next release.

if ( GARBAGE_SIZE % opt_n_threads ) applog( LOG_WARNING,"WARNING: Thread count must be power of 2. Miner may crash or produce invalid hash!" );

JayDDee commented 4 years ago

cpuminer-opt-3.14.3 is released with the changes described above.