Closed lakinwecker closed 2 months ago
Thanks. With which architecture and compiler does this happen?
Since this happens already during the bench command of the PGO build I would assume that this should also be reproducible with a non-pgo build. However, running bench on debug=yes builds also happens during CI where I haven't seen the issue yet. Also make ARCH=x86-64-bmi2 profile-build debug=yes
seems to work fine for me locally.
It happens for me with the same command you pasted above:
g++ (GCC) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
It's on a first-gen threadripper (1950x).
When you say "which architecture?", I assume you mean which CPU architecture or do you mean which ARCH
setting passed to the compiler. So far, It has crashed in the same way for me with:
make ARCH=x86-64-bmi2 profile-build debug=yes
make ARCH=x86-64-avx2 profile-build debug=yes
make profile-build debug=yes
This is the config output for the last one:
debug: 'yes'
sanitize: 'none'
optimize: 'yes'
arch: 'x86_64'
bits: '64'
kernel: 'Linux'
os: 'GNU/Linux'
prefetch: 'yes'
popcnt: 'yes'
pext: 'no'
sse: 'yes'
mmx: 'no'
sse2: 'yes'
ssse3: 'yes'
sse41: 'yes'
avx2: 'no'
avx512: 'no'
vnni256: 'no'
vnni512: 'no'
neon: 'no'
Fairy-Stockfish specific:
largeboards: 'no'
all: 'no'
precomputedmagics: 'yes'
nnue: 'no'
Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -std=c++17 -Wno-profile-instr-out-of-date -DNNUE_EMBEDDING_OFF -Wextra -Wshadow -pedantic -m64 -DUSE_PTHREADS -g -O3 -DIS_64BIT -msse -msse3 -mpopcnt -DUSE_POPCNT -DUSE_SSE41 -msse4.1 -DUSE_SSSE3 -mssse3 -DUSE_SSE2 -msse2
LDFLAGS: -m64 -Wl,--no-as-needed -lpthread```
And for each of those commands it succeeds when I don't include the debug=yes
as an option.
And as far as I can tell, no issue with the clang I have:
clang version 13.0.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
By architecture I was referring to the ARCH
setting, but the actual CPU arch can also be relevant, although an unsupported CPU instruction should normally result in a more specific error message. And although I think I remember threadripper had some issues with bmi2/pext (although only performance related as far as I remember), that can be excluded anyways since it also happens with AVX2.
Since the exact same command had worked fine for me, I can't seem to reproduce it. Could you please try to run a normal build
instead of a profile-build
(but again with debug=yes)
and then run ./stockfish bench
manually? In principle this should result in the same assertion failure. And you are using current master I assume, right?
Yes, I'm on current master:
➜ git log -1
commit f38948accd6eb340febcfc0db3b7e619146a7f9b (HEAD -> master, origin/master, origin/HEAD)
Author: Fabian Fichter <ianfab@users.noreply.github.com>
Date: Sun Apr 17 12:13:39 2022 +0200
Update README.md
trying the non-profile build as well
Regular build also crashes
-> ./stockfish bench
Fairy-Stockfish 190422 by Fabian Fichter
stockfish: bitboard.h:460: Stockfish::RiderType Stockfish::pop_rider(Stockfish::RiderType*): Assertion `*r' failed.
fish: Job 1, './stockfish bench' terminated by signal SIGABRT (Abort)
It's not CPU specific, I see the same issue on my laptop, which is an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
I'm on a (reasonably) up to date arch linux, so I suspect it's a g++ specific bug.
The strange thing is that I saw this on an CI build as well, and I'm unclear as to how, because the CI would have been on a reasonably recent ubuntu as well, but maybe it uses the same g++ version?
using
make ARCH=x86-64-avx2 build optimize=no debug=yes
doesn't crash ... which is quite interesting.
Just to have all discussion in one thread, I gonna quote my comment from #588 which reported the same issue.
Given that compiler optimizations shouldn't change functionality there can only be two possible explanations as far as I can judge:
- It is a compiler bug.
- The code violates the assumptions of the optimization, i.e., has undefined behavior.
Either one unfortunately is very hard to track down, but if anyone has input, that is very welcome.
Should be resolved by #810, if not feel free to reopen.
This may be a complete non-issue, or it may be indicative of some sort of minor bug. But if you comment out line 513 of the Makefile and run:
make profile-build
It will crash during the bench with:
It occurs on this line in the Bitboard initialization: https://github.com/ianfab/Fairy-Stockfish/blob/master/src/bitboard.cpp#L352
I only ran across it due to a bug in my compilation of the rust wrapper library. Feel free to just close it if you think this is expected behaviour and compiling like this is unsupported.