fairy-stockfish / Fairy-Stockfish

chess variant engine supporting Xiangqi, Shogi, Janggi, Makruk, S-Chess, Crazyhouse, Bughouse, and many more
https://fairy-stockfish.github.io/
GNU General Public License v3.0
607 stars 189 forks source link

Release build without defining NDEBUG fails asserts during initialization #464

Closed lakinwecker closed 2 months ago

lakinwecker commented 2 years ago

This may be a complete non-issue, or it may be indicative of some sort of minor bug. But if you comment out line 513 of the Makefile and run: make profile-build

It will crash during the bench with:

Step 2/4. Running benchmark for pgo-build ...
./stockfish bench > /dev/null
stockfish: bitboard.h:460: Stockfish::RiderType Stockfish::pop_rider(Stockfish::RiderType*): Assertion `*r' failed.
make: *** [Makefile:799: profile-build] Error 134

It occurs on this line in the Bitboard initialization: https://github.com/ianfab/Fairy-Stockfish/blob/master/src/bitboard.cpp#L352

I only ran across it due to a bug in my compilation of the rust wrapper library. Feel free to just close it if you think this is expected behaviour and compiling like this is unsupported.

ianfab commented 2 years ago

Thanks. With which architecture and compiler does this happen?

Since this happens already during the bench command of the PGO build I would assume that this should also be reproducible with a non-pgo build. However, running bench on debug=yes builds also happens during CI where I haven't seen the issue yet. Also make ARCH=x86-64-bmi2 profile-build debug=yes seems to work fine for me locally.

lakinwecker commented 2 years ago

It happens for me with the same command you pasted above:

g++ (GCC) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It's on a first-gen threadripper (1950x).

When you say "which architecture?", I assume you mean which CPU architecture or do you mean which ARCH setting passed to the compiler. So far, It has crashed in the same way for me with: make ARCH=x86-64-bmi2 profile-build debug=yes make ARCH=x86-64-avx2 profile-build debug=yes make profile-build debug=yes

This is the config output for the last one:


debug: 'yes'
sanitize: 'none'
optimize: 'yes'
arch: 'x86_64'
bits: '64'
kernel: 'Linux'
os: 'GNU/Linux'
prefetch: 'yes'
popcnt: 'yes'
pext: 'no'
sse: 'yes'
mmx: 'no'
sse2: 'yes'
ssse3: 'yes'
sse41: 'yes'
avx2: 'no'
avx512: 'no'
vnni256: 'no'
vnni512: 'no'
neon: 'no'

Fairy-Stockfish specific:
largeboards: 'no'
all: 'no'
precomputedmagics: 'yes'
nnue: 'no'

Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -std=c++17  -Wno-profile-instr-out-of-date -DNNUE_EMBEDDING_OFF -Wextra -Wshadow -pedantic -m64 -DUSE_PTHREADS -g -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT -DUSE_SSE41 -msse4.1 -DUSE_SSSE3 -mssse3 -DUSE_SSE2 -msse2
LDFLAGS:  -m64 -Wl,--no-as-needed -lpthread```
lakinwecker commented 2 years ago

And for each of those commands it succeeds when I don't include the debug=yes as an option.

lakinwecker commented 2 years ago

And as far as I can tell, no issue with the clang I have:

clang version 13.0.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
ianfab commented 2 years ago

By architecture I was referring to the ARCH setting, but the actual CPU arch can also be relevant, although an unsupported CPU instruction should normally result in a more specific error message. And although I think I remember threadripper had some issues with bmi2/pext (although only performance related as far as I remember), that can be excluded anyways since it also happens with AVX2.

Since the exact same command had worked fine for me, I can't seem to reproduce it. Could you please try to run a normal build instead of a profile-build (but again with debug=yes) and then run ./stockfish bench manually? In principle this should result in the same assertion failure. And you are using current master I assume, right?

lakinwecker commented 2 years ago

Yes, I'm on current master:

➜ git log -1
commit f38948accd6eb340febcfc0db3b7e619146a7f9b (HEAD -> master, origin/master, origin/HEAD)
Author: Fabian Fichter <ianfab@users.noreply.github.com>
Date:   Sun Apr 17 12:13:39 2022 +0200

    Update README.md

trying the non-profile build as well

lakinwecker commented 2 years ago

Regular build also crashes

-> ./stockfish bench
Fairy-Stockfish 190422 by Fabian Fichter
stockfish: bitboard.h:460: Stockfish::RiderType Stockfish::pop_rider(Stockfish::RiderType*): Assertion `*r' failed.
fish: Job 1, './stockfish bench' terminated by signal SIGABRT (Abort)
lakinwecker commented 2 years ago

It's not CPU specific, I see the same issue on my laptop, which is an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

I'm on a (reasonably) up to date arch linux, so I suspect it's a g++ specific bug.

The strange thing is that I saw this on an CI build as well, and I'm unclear as to how, because the CI would have been on a reasonably recent ubuntu as well, but maybe it uses the same g++ version?

lakinwecker commented 2 years ago

using make ARCH=x86-64-avx2 build optimize=no debug=yes doesn't crash ... which is quite interesting.

ianfab commented 1 year ago

Just to have all discussion in one thread, I gonna quote my comment from #588 which reported the same issue.

Given that compiler optimizations shouldn't change functionality there can only be two possible explanations as far as I can judge:

  1. It is a compiler bug.
  2. The code violates the assumptions of the optimization, i.e., has undefined behavior.

Either one unfortunately is very hard to track down, but if anyone has input, that is very welcome.

ianfab commented 2 months ago

Should be resolved by #810, if not feel free to reopen.