Closed tissatussa closed 6 months ago
Hi,
At some point I stopped focusing on making it work on Linux, because I did not perceive any interest from anyone to run it on Linux. Thus there might be some OS dependent implementations which are missing.
However as you have shown some interest, I am eager to try to make it work again. Thank you for looking into this :)
I have tried to recreate the issue you have described, however I am not able to recreate the segmentation fault on ubuntu 20.04. Not sure if there is any critical difference between ubuntu and xubuntu.
As you are not getting any logging in your terminal, I assume that you are compiling with
-DPRINT_TO_FILE
. Would you be able to check if there are any output to the Arcanum.exe.log? It should be created in the same location as the executable file. If not you could try to compile without
the -DDISABLE_*
flags to log more info the the terminal.
The only segmentation fault I was able to create was when __linux__
was not defined which made it run some outdated code. Could this be the issue?
I have created a branch with some changes: https://github.com/LarsAur/Arcanum/tree/7-can-not-compile-on-linux. Would you be able to try to compile the program with these changes? (make clean
should probably be ran before make
)
However as you have shown some interest, I am eager to try to make it work again.
and i'm eager to help .. solve this Issue and learn ! later this day i will look into this, using your info.
btw. you should start using Linux ! for years i used Window$, i knew a lot about it, but this OS kept bothering me : all kinds of compatibilty problems, issues with the Registry (a bad concept anyhow), using a virus-scanner and firewall (on Linux both are not needed), etc. .. since i switched to Linux, which is an 'adult' OS for many years now -in many flavours-, i feel like there's only one disadvantage : you never want another OS !
Not sure if there is any critical difference between ubuntu and xubuntu.
i think no critical differences exist between all Ubuntu distros / flavours in "the way they work" .. they mainly differ in their Window Manager, thus their 'complete GUI' .. when searching for Linux answers i mostly use the StackOverflow forum (we all do?!) and sometimes i encounter people giving answers with other paths, although the folder and file names always seem to match, so i often manage to solve my problems. Many info is related to Linux in general and Debian is widespread.
btw. all Ubuntu flavours are Debian based .. i didn't try many other Linux distros, only Mint (=Debian based) and the very minimal 'Bodhi' because it still supplies a 32-bit version (for my old laptop), although no longer maintained - read its intro : https://www.bodhilinux.com .. it has a very minimal GUI, but it works for me : using the HDMI output of that free laptop for a beamer (to present chess lessons at my club) .. while it's Ubuntu based i can also use Synaptic, the famous repository manager, and see / search applications to (de)install - there are no setup.exe
to download in the wild (virus?), all is managed by Synaptic, like a registry but more like a stable archive having all programs which will never conflict : compatibility is key. Moreover, all folders and files are accessible - file/folder (names) are 'hidden' when starting with a dot, they belong to the system. So Bodhi gives me insight while exploring Ubuntu that way ..
As you are not getting any logging in your terminal, I assume that you are compiling with -DPRINT_TO_FILE.
that's right. At first i compiled Arcanum v2.0 with terminal logging but CuteChess GUI shows this output in its Engine Debug pane, looking 'nasty', obscuring all UCI 'info strings'.
Would you be able to check if there are any output to the Arcanum.exe.log? It should be created in the same location as the executable file. If not you could try to compile without the -DDISABLE_* flags to log more info the the terminal.
see log.zip
i didn't change the -DDISABLE_*
could more info be logged ? Maybe you can even supply a dev version (for me now) ? .. or just create some (GUI) test UCI checkbox setoption
, like "Log debug info Y/N" .. i guess this Issue is best solved by you using Linux :)
The only segmentation fault I was able to create was when
__linux__
was not defined which made it run some outdated code. Could this be the issue?
we'll see ..
I have created a branch with some changes: https://github.com/LarsAur/Arcanum/tree/7-can-not-compile-on-linux
very nice .. you're really eager to solve this thing and learn .. love it!
Would you be able to try to compile the program with these changes?
i'll manage
(make clean should probably be ran before make)
i didn't know that, i just find all .o
files in your folder tree and remove them before compiling again, testing changes in the makefile
and code .. btw. you could mention those make
functions in README - or you don't and auto-remove such files in makefile
.. like auto-create a(n existing) subfolder .. nevermind .. but these are things for the "general user" when compiling .. otherwise supply Linux assets, that's most convenient : such version might probably never be faster then a custom compiled binary - i will test those on Xubuntu 22.04 or even Bodhi :)
FYI
recently i created my first real UCI engine, it works in CuteChess .. it's very simple, based on some "TSCP clone", just to explore and learn many things .. it can only reach depth 5 in most middlegame positions with hardly any pruning ..
it should be easy to beat !? well, often it makes a 'mistake' and gives a piece without compensation .. but i had fun games with it, see https://lichess.org/LrsgFvwj/black .. here i play all 'decent' moves, but i 'tuned' this game because ARBE always plays the same and i can adjust my parameters :)
i'm a logical artist, using intuition and basics .. the chess animation inspired me to make this video, just landed :
https://github.com/LarsAur/Arcanum/assets/1109281/0dee1a8d-f9a1-4820-82f3-38ada4c5626a
I do have an Ubuntu 20.04 running in WSL, so I am able to run linux 😉
Thank you for supplying the log file. From what I can gather, __linux__
is well defined in your compiler, so we can exclude that from being the problem.
I guess this log is not from the run where the segmentation fault occurred? I can see that it exits due to a UCI 'quit' command, so it seems like it ran all the way to the end without crashing. Would you be able to provide a log where Arcanum 2.0 crashes?
Are there any other details which might hint at how/why it crashes or what is different in our setups? Some ideas:
Did you have success running the engine with the changes in the new branch? Does the crash still occur?
Does the crash happen every time or is it random? (Can indicate if it is a race condition or not)
Which compiler are you using, and which version? On Linux I am currently using g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
You could try to run a version I built on Ubuntu: Arcanum-2.0-debug.zip
Lastly, you could try to compile it with Address Sanitizer: https://www.osc.edu/resources/getting_started/howto/howto_use_address_sanitizer
Basically you can change line 17 in the makefile to be
LDFLAGS = -lpthread -lm -lstdc++ -fsanitize=address -static-libasan -g
Running it will then give some feedback regarding possible memory issues.
When I am running it, I am not getting any warnings or errors from address sanitizer, but maybe you will.
Nice to hear that you have created your own engine, I think every engine has been at a similar level when was created. Slowly over time it improves as you add features and fix bugs. I think the most important part is to have some fun while doing it so you can stay motivated😃 I do play some chess, but I am not so good. Probably a better programmer than chess player 🤓
very good detailed info. i will look into all of it soon.
i gut feeling is some max memory exceeded .. TT ?
Might be, but it is still curious that it crashes in the middle of the run. If the TT was too big you would expect it to crash when the size is increased, or at least get some warnings, as there are checks to verify memory allocation for the TT.
It could of course be a combination of the TT being too big and the growing stack size while searching. If it works for you with a smaller TT size, you are probably correct. Let me know if you find something.
..or at least get some warnings..
i wanted those to appear by using gdb
or lldb
, but some race condition won't happen then, so we have no output. The OS just shuts down the process, Linux rules when reaching some system limit
? A segmentation fault
is a severe core error.
..If it works for you with a smaller TT size..
yesterday evening i tried changing several things in the code, just by using my logic, experience and intuition, and keeping things basic and simple .. i think the solution is just one little thing we don't consider .. see my Monza Issue ..
i wanted to test a smaller TT size first .. i see some 1024 * 1024
value, i tried to make it 128 * 128
but still a crash .. how to set mbSize
?
btw. does TT size mean the Hash
size in Mb, UCI setoption
?
Correct, when you set the hash size in UCI: setoption name hash value N
, N is the size of the TT in megabytes. This is why it multiplies the value by 1024 * 1024 to calculate the number of bytes.
I can take a closer look at the Monza issue
..it multiplies the value by 1024 * 1024 to calculate the number of bytes..
so it has a max TT / memory value ? v1.12 has default 32 Mb. But are any other "increasing memory issues" possible in your code ? The Monza Issue may not be related at all ..
btw. i still didn't test any new code yet - other things ..
so it has a max TT / memory value ?
No there is no limit set by the engine on how much memory you can allocate for the Transposition Table. The only real limit would be the amount of memory available on your computer, and of course the value you set through UCI.
..are any other "increasing memory issues" possible in your code?
The memory usage is increased somewhat by using NNUE which has to be loaded into memory. However, this is only ~1MB.
Btw, it would be nice to get some feedback regarding questions above. It is very hard to debug the issue when I am not able to recreate it.
- Did you have success running the engine with the changes in the new branch? Does the crash still occur?
- Does the crash happen every time or is it random? (Can indicate if it is a race condition or not)
- Which compiler are you using, and which version? On Linux I am currently using g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
- You could try to run a version I built on Ubuntu: Arcanum-2.0-debug.zip
clear .. it may be a nice sunday afternoon Issue :)
Small update: I have managed to compile the code with g++ 11.4.0 and I now get the same segfault as you have described.
I believe I have found and fixed the issue. The changes is pushed to https://github.com/LarsAur/Arcanum/tree/7-can-not-compile-on-linux. Could you verify that this fixes your issue.
I managed to replicate the issue by using g++ 11.4.0, and identify the cause using address sanitizer.
I believe I have found and fixed the issue. The changes is pushed to https://github.com/LarsAur/Arcanum/tree/7-can-not-compile-on-linux . Could you verify that this fixes your issue.
Hurray!
compiling this went well (without changing any of this code) and the created binary runs fine in CuteChess - i just used the makefile
with the -j
option.
so, this Issue is solved .. what are your final thoughts ? how to explain the error ?
thanks for Address Sanitizer, i wasn't aware this tool exists - but i didn't try it yet.
same for gdb
and lldb
: those tools are valuable and mentioned to me by some other programmers.
btw. your README.md gives a flawed layout in Okular, my best GUI viewer for .md files : lines are displayed without linebreaks when omitting empty lines before and after a backtick character .. i adjusted your text to suit Okular : README.zip - use this, other .md
viewers might have no problem with my changes.
[Event "engine vs engine"] [Site "Holland"] [Date "2024.05.05"] [Round "?"] [White "Eccat v0.1.2"] [Black "Arcanum v2.0 NNUE"] [Result "0-1"] [ECO "C00"] [Opening "French defense"] [PlyCount "114"] [TimeControl "420+3"]
Thank you!
The issue was in the code for updating the Zobrist hash. When updating the hash using a null move, which in Arcanum is defined by Move(0,0), which is the move a1a1 with moveinfo = 0. In Arcanum, moveinfo is a bitfield describing the metadata of the move, e.g. which piece is moved, what is captured etc. To calculate a table index in the zobrist code, the moved piece type is used. The moved piece is extracted from the moveinfo by _tzcnt_u64(moveInfo & MOVE_INFO_MOVE_MASK)
. However, when moveinfo is 0, 64 is returned, which is not a legal piece. 64 was then used as an index in the Zobrist table, which caused an out of bounds read.
As long as a segfault does not happen, this issue does not affect the result due to how the result from the table lookup is used. Simplified: table[64] ^ table[64] = 0. That is why I had not discovered previously.
I believe the different compilers used different memory layouts, which actually caused the out of bounds read to cause a segfault only for some compiler versions.
Solution was to make a specialized function for updating the hash for null moves. For details see: f48ad4abb6d1dd0b5d9f40b1c71aac5084b841ce
It's hard to describe in text, but I hope it suffices 🤓.
Thanks again for submitting the issue!
so, it was indeed due to some TT / Hash / memory issue, as i suspected .. glad you solved it !
i tried to compile your v2.0 code on Linux .. it seems a valid binary is created, but the program crashes :
Segmentation fault (core dumped)
. I did some research and tests, changing some concerning code lines, but i failed .. however, the program runs without problems when i use a debugger !? I will explain below.at first i encountered a basic error :
it seems the file
fengen.hpp
is expected, but it's calledfenGen.hpp
(mark the capital letter G) .. so i renamed that file into all lowercase letters and now compilation succeeds : on Windows, filenames are case-insensitive (isn't it?), so you won't notice this simple mistake .. i'll suggest to always use lowercase letters for ALL filenames.when running in CuteChess GUI the program crashes ! executing in terminal gives this output :
however, when i use a debugger (gdb or lldb) this does NOT happen !? :
this is weird, we should expect the opposite .. once i had a similar Issue with the Monza chess engine, see https://github.com/mourabitiziyad/Monza-Chess/issues/1 .. here i describe in detail what i found .. it can be a 'race condition' : the debugger causes the execution to be a bit slower, and THEREFOR there's no crash .. regarding the Monza engine i found a solution by adjusting the TT size, but tinkering the concerning code didn't help .. i also adjusted the makefile in many ways, just to see the difference, but to no avail.
compiling v1.12 went without such error, i hope you can solve this Issue and/or reproduce it .. i guess my explanation could help. NOTE: i have only 8 Gb RAM, normally i set max 256 Mb Hash for any chess engine and max 2 threads.
[ i'm on Xubuntu 22.04 ]