basil00 / Fathom

Syzygy TB probe tool.
MIT License
17 stars 26 forks source link

occasional crashes calling probe_wdl multithreaded #15

Open jdart1 opened 7 years ago

jdart1 commented 7 years ago

I am seeing occasional crashing calling probe_wdl (supposedly thread-safe) in a multithreaded engine.

Stack trace follows (generated by GCC 6.2 with -fsanitize=address and -fsanitize=bounds) (Note: this is with my pending pull request applied):

==23941==ERROR: AddressSanitizer: SEGV on unknown address 0x7f9b9c815a02 (pc 0x0000004ec1b9 bp 0x7f99f9ac0bf0 sp 0x7f99f9ac0b80 T3)

0 0x4ec1b8 in decompress_pairs syzygy/tbcore.c:1500

#1 0x4f07f9 in probe_wdl_table syzygy/tbprobe.c:780
#2 0x4f689b in probe_ab syzygy/tbprobe.c:1402
#3 0x4f6b7b in probe_wdl syzygy/tbprobe.c:1420
#4 0x4f8d93 in tb_probe_wdl_impl syzygy/tbprobe.c:1852
#5 0x4dde56 in tb_probe_wdl syzygy/tbprobe.h:223
#6 0x4dec69 in SyzygyTb::probe_wdl(Board const&, int&, bool) /home/jdart/dev/arasan-chess/src/syzygy.cpp:119
#7 0x4a47d0 in Search::search() /home/jdart/dev/arasan-chess/src/search.cpp:2428
#8 0x4afe02 in Search::search(int, int, int, int, int) (/home/jdart/dev/arasan-chess/bin/arasanx-64-popcnt+0x4afe02)
#9 0x4a722a in Search::search() /home/jdart/dev/arasan-chess/src/search.cpp:2848
#10 0x4afe02 in Search::search(int, int, int, int, int) (/home/jdart/dev/arasan-chess/bin/arasanx-64-popcnt+0x4afe02)
#11 0x4a722a in Search::search() /home/jdart/dev/arasan-chess/src/search.cpp:2848
#12 0x4afe02 in Search::search(int, int, int, int, int) (/home/jdart/dev/arasan-chess/bin/arasanx-64-popcnt+0x4afe02)
#13 0x4a722a in Search::search() /home/jdart/dev/arasan-chess/src/search.cpp:2848
#14 0x4afe02 in Search::search(int, int, int, int, int) (/home/jdart/dev/arasan-chess/bin/arasanx-64-popcnt+0x4afe02)
#15 0x4a9dc1 in Search::searchSMP(ThreadInfo*) /home/jdart/dev/arasan-chess/src/search.cpp:3203
#16 0x4d059d in ThreadPool::idle_loop(ThreadInfo*, SplitPoint const*) /home/jdart/dev/arasan-chess/src/threadp.cpp:137
#17 0x4d0714 in parkingLot /home/jdart/dev/arasan-chess/src/threadp.cpp:162
#18 0x7f99fef6e6f9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76f9)
#19 0x7f99fe99bb5c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x106b5c)

AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV syzygy/tbcore.c:1500 in decompress_pairs Thread T3 created by T0 here:

0 0x7f99ff53d558 in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.3+0x31558)

#1 0x4d0a4e in ThreadInfo::ThreadInfo(ThreadPool*, int) /home/jdart/dev/arasan-chess/src/threadp.cpp:212
#2 0x4d0ccf in ThreadPool::ThreadPool(SearchController*, int) /home/jdart/dev/arasan-chess/src/threadp.cpp:252
#3 0x49244c in SearchController::SearchController() /home/jdart/dev/arasan-chess/src/search.cpp:179
#4 0x41c5be in main /home/jdart/dev/arasan-chess/src/arasanx.cpp:3694
#5 0x7f99fe8b582f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
basil00 commented 7 years ago

Does it still crash when the sanitizers are disabled?

jdart1 commented 7 years ago

Yes. But it is infrequent. Once every dozen or so long time-control games. With tablebases disabled I see no crashes. Btw. I notice Ronald de Man's code at https://github.com/syzygy1/tb/tree/master/src has made some decompress fixes recently. But it is not immediately clear to me how to apply to Fathom.

basil00 commented 7 years ago

The crash occurs in tbcore.c, which is pretty much unchanged from Ronald's version. I am also not at all familiar with this code.

Once every dozen or so long time-control games.

That is quite frequent, so it is unusual that it has not been noticed before.

So I am really not sure since I can't reproduce the problem. I have tested 1000s of games with Fathom and Gull and did not observe any crashes.

jdart1 commented 7 years ago

This appears to be fixed by making variable "ready" atomic (but this only works for C++). See

https://github.com/jdart1/Fathom/commit/64685b54da02f36676e4d6a4a503b95b42fc711c

basil00 commented 7 years ago

Perhaps this is worth reporting to Ronald?