can not compile on Linux

tissatussa commented 6 months ago

i tried to compile your v2.0 code on Linux .. it seems a valid binary is created, but the program crashes : Segmentation fault (core dumped). I did some research and tests, changing some concerning code lines, but i failed .. however, the program runs without problems when i use a debugger !? I will explain below.

at first i encountered a basic error :

src/tuning/fenGen.cpp:1:10: fatal error: tuning/fengen.hpp: No such file or directory
    1 | #include <tuning/fengen.hpp>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [makefile:62: build/src/tuning/fenGen.o] Error 1
make: *** Waiting for unfinished jobs....
src/uci.cpp:10:10: fatal error: tuning/fengen.hpp: No such file or directory
   10 | #include <tuning/fengen.hpp>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [makefile:62: build/src/uci.o] Error 1

it seems the file fengen.hpp is expected, but it's called fenGen.hpp (mark the capital letter G) .. so i renamed that file into all lowercase letters and now compilation succeeds : on Windows, filenames are case-insensitive (isn't it?), so you won't notice this simple mistake .. i'll suggest to always use lowercase letters for ALL filenames.

when running in CuteChess GUI the program crashes ! executing in terminal gives this output :

$ ./Arcanum.exe 
uci
id name Arcanum dev_build
id author Lars Murud Aurud
option name Hash type spin default 32 min 0 max 8196
option name ClearHash type button
option name SyzygyPath type string default <empty>
option name NNUEPath type string default arcanum1.fnnue
uciok
ucinewgame
position startpos
go wtime 423000 btime 423000 winc 3000 binc 3000
info depth 1 nodes 22 score cp 136 pv d2d4
info depth 2 nodes 104 score cp 78 pv d2d4 e7e6
info depth 3 time 1 nodes 747 score cp 154 nps 747000 pv d2d4 d7d5 g1f3
Segmentation fault (core dumped)

however, when i use a debugger (gdb or lldb) this does NOT happen !? :

$ gdb ./Arcanum.exe 
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./Arcanum.exe...
(No debugging symbols found in ./Arcanum.exe)
(gdb) run
Starting program: /home/roelof/Compiled/Arcanum-2.0/build/Arcanum.exe 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
uci
id name Arcanum dev_build
id author Lars Murud Aurud
option name Hash type spin default 32 min 0 max 8196
option name ClearHash type button
option name SyzygyPath type string default <empty>
option name NNUEPath type string default arcanum1.fnnue
uciok
ucinewgame
position startpos
go wtime 423000 btime 423000 winc 3000 binc 3000
[New Thread 0x7ffff5f35640 (LWP 9626)]
info depth 1 nodes 22 score cp 136 pv d2d4
info depth 2 nodes 104 score cp 78 pv d2d4 e7e6
info depth 3 time 2 nodes 747 score cp 154 nps 373500 pv d2d4 d7d5 g1f3
info depth 4 time 8 nodes 2111 score cp 71 nps 263875 pv d2d4 d7d5 b1c3 c7c6
info depth 5 time 32 nodes 7821 score cp 139 hashfull 1 nps 244406 pv b1c3 d7d5 e2e4 d5d4 c3d5
info depth 6 time 66 nodes 41473 score cp 80 hashfull 9 nps 628378 pv e2e4 d7d5 e4e5 a7a5 d2d4 b8c6
info depth 7 time 113 nodes 100084 score cp 129 hashfull 22 nps 885699 pv e2e4 d7d5 e4e5 c8f5 b1c3 d5d4 c3b5
info depth 8 time 170 nodes 168267 score cp 109 hashfull 39 nps 989805 pv e2e4 d7d5 e4e5 c8f5 d2d4 c7c6 f1d3 g8h6
info depth 9 time 338 nodes 365260 score cp 110 hashfull 83 nps 1080650 pv e2e4 d7d5 e4e5 c8f5 d2d4 b8c6 b1c3 c6b4 f1d3
info depth 10 time 845 nodes 937603 score cp 100 hashfull 217 nps 1109589 pv e2e4 d7d5 e4d5 g8f6 d2d4 e7e6 d5e6 c8e6 b1c3 f8b4
info depth 11 time 1612 nodes 1827542 score cp 100 hashfull 382 nps 1133710 pv e2e4 d7d5 e4d5 d8d5 g1f3 a7a6 b1c3 d5e6 d1e2 b8c6 e2e6 c8e6
info depth 12 time 3915 nodes 4362625 score cp 116 hashfull 712 nps 1114335 pv e2e4 d7d5 e4d5 c7c6 d2d4 d8d5 b1c3 d5d6 g1f3 g8f6 f3e5 a7a5
info depth 12 time 12272 nodes 13577894 score cp 93 hashfull 983 nps 1106412 pv e2e4 e7e5 g1f3 b8c6 b1c3 g8f6 f1c4 f6e4 c4f7 e8f7 c3e4 d7d5 e4g5 f7e8 d2d4
bestmove e2e4
[Thread 0x7ffff5f35640 (LWP 9626) exited]

this is weird, we should expect the opposite .. once i had a similar Issue with the Monza chess engine, see https://github.com/mourabitiziyad/Monza-Chess/issues/1 .. here i describe in detail what i found .. it can be a 'race condition' : the debugger causes the execution to be a bit slower, and THEREFOR there's no crash .. regarding the Monza engine i found a solution by adjusting the TT size, but tinkering the concerning code didn't help .. i also adjusted the makefile in many ways, just to see the difference, but to no avail.

compiling v1.12 went without such error, i hope you can solve this Issue and/or reproduce it .. i guess my explanation could help. NOTE: i have only 8 Gb RAM, normally i set max 256 Mb Hash for any chess engine and max 2 threads.

[ i'm on Xubuntu 22.04 ]

LarsAur commented 6 months ago

Hi,

At some point I stopped focusing on making it work on Linux, because I did not perceive any interest from anyone to run it on Linux. Thus there might be some OS dependent implementations which are missing.

However as you have shown some interest, I am eager to try to make it work again. Thank you for looking into this :)

LarsAur commented 6 months ago

I have tried to recreate the issue you have described, however I am not able to recreate the segmentation fault on ubuntu 20.04. Not sure if there is any critical difference between ubuntu and xubuntu.

As you are not getting any logging in your terminal, I assume that you are compiling with -DPRINT_TO_FILE. Would you be able to check if there are any output to the Arcanum.exe.log? It should be created in the same location as the executable file. If not you could try to compile without the -DDISABLE_* flags to log more info the the terminal.

The only segmentation fault I was able to create was when __linux__ was not defined which made it run some outdated code. Could this be the issue?

I have created a branch with some changes: https://github.com/LarsAur/Arcanum/tree/7-can-not-compile-on-linux. Would you be able to try to compile the program with these changes? (make clean should probably be ran before make)

tissatussa commented 6 months ago

However as you have shown some interest, I am eager to try to make it work again.

and i'm eager to help .. solve this Issue and learn ! later this day i will look into this, using your info.

btw. you should start using Linux ! for years i used Window$, i knew a lot about it, but this OS kept bothering me : all kinds of compatibilty problems, issues with the Registry (a bad concept anyhow), using a virus-scanner and firewall (on Linux both are not needed), etc. .. since i switched to Linux, which is an 'adult' OS for many years now -in many flavours-, i feel like there's only one disadvantage : you never want another OS !

linux-rules

tissatussa commented 6 months ago

Not sure if there is any critical difference between ubuntu and xubuntu.

i think no critical differences exist between all Ubuntu distros / flavours in "the way they work" .. they mainly differ in their Window Manager, thus their 'complete GUI' .. when searching for Linux answers i mostly use the StackOverflow forum (we all do?!) and sometimes i encounter people giving answers with other paths, although the folder and file names always seem to match, so i often manage to solve my problems. Many info is related to Linux in general and Debian is widespread.

btw. all Ubuntu flavours are Debian based .. i didn't try many other Linux distros, only Mint (=Debian based) and the very minimal 'Bodhi' because it still supplies a 32-bit version (for my old laptop), although no longer maintained - read its intro : https://www.bodhilinux.com .. it has a very minimal GUI, but it works for me : using the HDMI output of that free laptop for a beamer (to present chess lessons at my club) .. while it's Ubuntu based i can also use Synaptic, the famous repository manager, and see / search applications to (de)install - there are no setup.exe to download in the wild (virus?), all is managed by Synaptic, like a registry but more like a stable archive having all programs which will never conflict : compatibility is key. Moreover, all folders and files are accessible - file/folder (names) are 'hidden' when starting with a dot, they belong to the system. So Bodhi gives me insight while exploring Ubuntu that way ..

As you are not getting any logging in your terminal, I assume that you are compiling with -DPRINT_TO_FILE.

that's right. At first i compiled Arcanum v2.0 with terminal logging but CuteChess GUI shows this output in its Engine Debug pane, looking 'nasty', obscuring all UCI 'info strings'.

Would you be able to check if there are any output to the Arcanum.exe.log? It should be created in the same location as the executable file. If not you could try to compile without the -DDISABLE_* flags to log more info the the terminal.

see log.zip

i didn't change the -DDISABLE_* could more info be logged ? Maybe you can even supply a dev version (for me now) ? .. or just create some (GUI) test UCI checkbox setoption, like "Log debug info Y/N" .. i guess this Issue is best solved by you using Linux :)

The only segmentation fault I was able to create was when __linux__ was not defined which made it run some outdated code. Could this be the issue?

we'll see ..

I have created a branch with some changes: https://github.com/LarsAur/Arcanum/tree/7-can-not-compile-on-linux

very nice .. you're really eager to solve this thing and learn .. love it!

Would you be able to try to compile the program with these changes?

i'll manage

(make clean should probably be ran before make)

i didn't know that, i just find all .o files in your folder tree and remove them before compiling again, testing changes in the makefile and code .. btw. you could mention those make functions in README - or you don't and auto-remove such files in makefile .. like auto-create a(n existing) subfolder .. nevermind .. but these are things for the "general user" when compiling .. otherwise supply Linux assets, that's most convenient : such version might probably never be faster then a custom compiled binary - i will test those on Xubuntu 22.04 or even Bodhi :)

FYI

recently i created my first real UCI engine, it works in CuteChess .. it's very simple, based on some "TSCP clone", just to explore and learn many things .. it can only reach depth 5 in most middlegame positions with hardly any pruning ..

it should be easy to beat !? well, often it makes a 'mistake' and gives a piece without compensation .. but i had fun games with it, see https://lichess.org/LrsgFvwj/black .. here i play all 'decent' moves, but i 'tuned' this game because ARBE always plays the same and i can adjust my parameters :)

hope you can follow the complications, you play chess ?

ARBE_v0 3_played_White=lichess org-LrsgFvwj

tissatussa commented 6 months ago

i'm a logical artist, using intuition and basics .. the chess animation inspired me to make this video, just landed :

https://github.com/LarsAur/Arcanum/assets/1109281/0dee1a8d-f9a1-4820-82f3-38ada4c5626a

LarsAur commented 6 months ago

I do have an Ubuntu 20.04 running in WSL, so I am able to run linux 😉

Thank you for supplying the log file. From what I can gather, __linux__ is well defined in your compiler, so we can exclude that from being the problem.

I guess this log is not from the run where the segmentation fault occurred? I can see that it exits due to a UCI 'quit' command, so it seems like it ran all the way to the end without crashing. Would you be able to provide a log where Arcanum 2.0 crashes?

Are there any other details which might hint at how/why it crashes or what is different in our setups? Some ideas:

Did you have success running the engine with the changes in the new branch? Does the crash still occur?
Does the crash happen every time or is it random? (Can indicate if it is a race condition or not)
Which compiler are you using, and which version? On Linux I am currently using g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
You could try to run a version I built on Ubuntu: Arcanum-2.0-debug.zip
Lastly, you could try to compile it with Address Sanitizer: https://www.osc.edu/resources/getting_started/howto/howto_use_address_sanitizer Basically you can change line 17 in the makefile to be LDFLAGS = -lpthread -lm -lstdc++ -fsanitize=address -static-libasan -g Running it will then give some feedback regarding possible memory issues. When I am running it, I am not getting any warnings or errors from address sanitizer, but maybe you will.

Nice to hear that you have created your own engine, I think every engine has been at a similar level when was created. Slowly over time it improves as you add features and fix bugs. I think the most important part is to have some fun while doing it so you can stay motivated😃 I do play some chess, but I am not so good. Probably a better programmer than chess player 🤓

tissatussa commented 6 months ago

very good detailed info. i will look into all of it soon.

tissatussa commented 6 months ago

i gut feeling is some max memory exceeded .. TT ?

LarsAur commented 6 months ago

Might be, but it is still curious that it crashes in the middle of the run. If the TT was too big you would expect it to crash when the size is increased, or at least get some warnings, as there are checks to verify memory allocation for the TT.

https://github.com/LarsAur/Arcanum/blob/024c62caff09c95ec531ba36864b72a3087907f9/src/transpositionTable.cpp#L46-L52

It could of course be a combination of the TT being too big and the growing stack size while searching. If it works for you with a smaller TT size, you are probably correct. Let me know if you find something.

tissatussa commented 6 months ago

..or at least get some warnings..

i wanted those to appear by using gdb or lldb, but some race condition won't happen then, so we have no output. The OS just shuts down the process, Linux rules when reaching some system limit ? A segmentation fault is a severe core error.

..If it works for you with a smaller TT size..

yesterday evening i tried changing several things in the code, just by using my logic, experience and intuition, and keeping things basic and simple .. i think the solution is just one little thing we don't consider .. see my Monza Issue ..

i wanted to test a smaller TT size first .. i see some 1024 * 1024 value, i tried to make it 128 * 128 but still a crash .. how to set mbSize ?

tissatussa commented 6 months ago

btw. does TT size mean the Hash size in Mb, UCI setoption ?

LarsAur commented 6 months ago

Correct, when you set the hash size in UCI: setoption name hash value N, N is the size of the TT in megabytes. This is why it multiplies the value by 1024 * 1024 to calculate the number of bytes.

I can take a closer look at the Monza issue

tissatussa commented 6 months ago

..it multiplies the value by 1024 * 1024 to calculate the number of bytes..

so it has a max TT / memory value ? v1.12 has default 32 Mb. But are any other "increasing memory issues" possible in your code ? The Monza Issue may not be related at all ..

tissatussa commented 6 months ago

btw. i still didn't test any new code yet - other things ..

LarsAur commented 6 months ago

so it has a max TT / memory value ?

No there is no limit set by the engine on how much memory you can allocate for the Transposition Table. The only real limit would be the amount of memory available on your computer, and of course the value you set through UCI.

..are any other "increasing memory issues" possible in your code?

The memory usage is increased somewhat by using NNUE which has to be loaded into memory. However, this is only ~1MB.

LarsAur commented 6 months ago

Btw, it would be nice to get some feedback regarding questions above. It is very hard to debug the issue when I am not able to recreate it.

Did you have success running the engine with the changes in the new branch? Does the crash still occur?

Does the crash happen every time or is it random? (Can indicate if it is a race condition or not)

Which compiler are you using, and which version? On Linux I am currently using g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

You could try to run a version I built on Ubuntu: Arcanum-2.0-debug.zip

tissatussa commented 6 months ago

clear .. it may be a nice sunday afternoon Issue :)

LarsAur commented 6 months ago

Small update: I have managed to compile the code with g++ 11.4.0 and I now get the same segfault as you have described.

LarsAur commented 6 months ago

I believe I have found and fixed the issue. The changes is pushed to https://github.com/LarsAur/Arcanum/tree/7-can-not-compile-on-linux. Could you verify that this fixes your issue.

FYI

I managed to replicate the issue by using g++ 11.4.0, and identify the cause using address sanitizer.

tissatussa commented 6 months ago

I believe I have found and fixed the issue. The changes is pushed to https://github.com/LarsAur/Arcanum/tree/7-can-not-compile-on-linux . Could you verify that this fixes your issue.

Hurray! compiling this went well (without changing any of this code) and the created binary runs fine in CuteChess - i just used the makefile with the -j option.

so, this Issue is solved .. what are your final thoughts ? how to explain the error ?

thanks for Address Sanitizer, i wasn't aware this tool exists - but i didn't try it yet. same for gdb and lldb : those tools are valuable and mentioned to me by some other programmers.

btw. your README.md gives a flawed layout in Okular, my best GUI viewer for .md files : lines are displayed without linebreaks when omitting empty lines before and after a backtick character .. i adjusted your text to suit Okular : README.zip - use this, other .md viewers might have no problem with my changes.

[Event "engine vs engine"] [Site "Holland"] [Date "2024.05.05"] [Round "?"] [White "Eccat v0.1.2"] [Black "Arcanum v2.0 NNUE"] [Result "0-1"] [ECO "C00"] [Opening "French defense"] [PlyCount "114"] [TimeControl "420+3"]

e4 {+0.12/14 23s} e6 {-1.03/12 12s} 2. Nc3 {+0.25/12 23s} h6 {-1.09/12 12s}
Nf3 {+0.47/12 14s} a6 {-1.26/12 12s} 4. d4 {+0.74/12 21s} c6 {-1.30/12 12s}
Bd3 {+0.88/11 21s} Bb4 {-1.56/11 12s} 6. e5 {+0.78/12 20s} d5 {-1.38/12 12s}
O-O {+0.73/13 14s} c5 {-1.15/12 12s} 8. Bd2 {+0.68/12 13s} c4 {-0.77/13 12s}
Be2 {+0.58/14 13s} Nc6 {-0.78/13 12s} 10. b3 {+0.58/13 11s} b5 {-0.81/12 12s}
bxc4 {+0.53/12 14s} Bxc3 {-0.99/13 12s} 12. Bxc3 {+0.50/12 11s} bxc4 {-1.05/14 12s} 13. Re1 {+0.47/12 11s} Nge7 {-0.67/13 12s}
a4 {+0.44/11 16s} Rb8 {-0.67/11 12s} 15. Qd2 {+0.38/13 13s} O-O {-0.58/12 12s} 16. Rab1 {+0.37/13 14s} Qc7 {-0.61/13 12s}
Rxb8 {+0.39/13 9.1s} Qxb8 {-0.64/14 12s} 18. a5 {+0.39/13 14s} Bd7 {-0.58/13 12s} 19. Bf1 {+0.46/13 8.5s} Nc8 {-0.40/14 12s}
h3 {+0.51/11 13s} N8a7 {0.00/15 12s} 21. Be2 {+0.08/13 8.6s} Nb5 {+0.06/15 12s} 22. Bb2 {0.00/13 12s} Rc8 {+0.14/13 12s}
Rd1 {-0.06/13 11s} Qa7 {+0.06/13 12s} 24. Qe1 {-0.05/13 7.9s} c3 {+0.40/14 12s} 25. Bxc3 {-0.05/14 11s} Nxe5 {+0.51/15 12s}
Nxe5 {0.00/14 6.5s} Nxc3 {+0.34/15 12s} 27. Nxd7 {-0.04/14 8.7s} Nxd1 {+1.00/16 12s} 28. Nb6 {+0.29/14 8.3s} Rxc2 {+0.52/16 12s}
Qxd1 {+0.10/14 9.8s} Qc7 {+1.05/15 12s} 30. Bf1 {-0.23/14 9.4s} Ra2 {+1.57/16 12s} 31. Qb1 {0.00/13 9.1s} Rxa5 {+1.26/15 12s}
h4 {-0.17/11 8.8s} Ra3 {+1.49/15 12s} 33. g3 {-0.73/12 8.5s} Qc6 {+1.49/15 12s} 34. Kh2 {-0.91/12 5.2s} a5 {+1.85/13 12s}
Be2 {-1.47/12 6.1s} Qc3 {+1.75/14 12s} 36. Nd7 {-1.64/13 8.0s} Rb3 {+2.92/15 12s} 37. Qd1 {-1.59/14 7.7s} Rb4 {+2.65/15 12s}
Ne5 {-1.58/13 7.5s} Rxd4 {+3.37/14 12s} 39. Bd3 {-1.14/13 4.5s} Rb4 {+2.94/15 12s} 40. Qh5 {-1.14/12 7.2s} Rb7 {+3.01/14 11s}
Ba6 {-1.27/13 7.0s} Rc7 {+3.37/14 9.7s} 42. Bf1 {0.00/12 6.8s} Qe1 {+4.23/13 8.9s} 43. Kg1 {-2.21/14 5.0s} g6 {+5.54/15 8.1s}
Qe2 {-3.39/14 6.5s} Qxe2 {+8.52/17 7.5s} 45. Bxe2 {-3.76/15 4.8s} a4 {+9.02/16 6.9s} 46. Nd3 {-3.83/14 6.2s} a3 {+11.04/16 6.4s}
Nb4 {-3.86/14 6.1s} Rb7 {+11.63/16 6.0s} 48. Na2 {-6.31/13 5.9s} Rb2 {+14.94/15 5.6s} 49. Nc1 {-7.80/15 5.8s} Rb1 {+21.97/14 5.3s}
Kh2 {-13.42/15 4.4s} Rxc1 {+27.36/14 5.0s} 51. h5 {-13.65/16 4.1s} g5 {+40.74/13 4.8s} 52. Bd1 {-13.65/14 5.5s} Rxd1 {+M11/12 4.5s}
Kg2 {-M12/13 5.4s} g4 {+M9/12 4.4s} 54. f3 {-M8/16 3.3s} Rd2+ {+M7/11 4.2s}
Kf1 {-M6/16 5.2s} a2 {+M5/11 4.0s} 56. Ke1 {-M4/15 5.1s} Rb2 {+M3/11 3.9s}
fxg4 {-M2/15 5.0s} a1=Q# {+M1/11 3.8s, Black mates} 0-1

LarsAur commented 6 months ago

Thank you!

The issue was in the code for updating the Zobrist hash. When updating the hash using a null move, which in Arcanum is defined by Move(0,0), which is the move a1a1 with moveinfo = 0. In Arcanum, moveinfo is a bitfield describing the metadata of the move, e.g. which piece is moved, what is captured etc. To calculate a table index in the zobrist code, the moved piece type is used. The moved piece is extracted from the moveinfo by _tzcnt_u64(moveInfo & MOVE_INFO_MOVE_MASK). However, when moveinfo is 0, 64 is returned, which is not a legal piece. 64 was then used as an index in the Zobrist table, which caused an out of bounds read.

As long as a segfault does not happen, this issue does not affect the result due to how the result from the table lookup is used. Simplified: table[64] ^ table[64] = 0. That is why I had not discovered previously.

I believe the different compilers used different memory layouts, which actually caused the out of bounds read to cause a segfault only for some compiler versions.

Solution was to make a specialized function for updating the hash for null moves. For details see: f48ad4abb6d1dd0b5d9f40b1c71aac5084b841ce

It's hard to describe in text, but I hope it suffices 🤓.

Thanks again for submitting the issue!

tissatussa commented 6 months ago

so, it was indeed due to some TT / Hash / memory issue, as i suspected .. glad you solved it !

LarsAur / Arcanum

can not compile on Linux #7

FYI