Closed Naviabheeman closed 6 months ago
I'll check the versions of dependencies especially boost
issue happens often on aarch64 was hosts but not on local x86-64.
This is the session where the very first duplication in the stack can be seen:
gdb) r
Starting program: /tmp/data/tapyrusd --datadir=/tmp/data --conf=/tmp/data/tapyrus.conf --reindex
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.
Breakpoint 3, AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:124
124 tapyrusd.cpp: No such file or directory.
(gdb) bt
#0 AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:124
#1 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff378) at tapyrusd.cpp:196
(gdb) continue
Continuing.
2024-02-19T18:07:11Z Tapyrus Core version v0.5.2.0-c72609f23 (release build)
2024-02-19T18:07:11Z InitParameterInteraction: parameter interaction: -whitelistforcerelay=1 -> setting -whitelistrelay=1
2024-02-19T18:07:11Z Validating signatures for all blocks.
2024-02-19T18:07:11Z Using the 'standard' SHA256 implementation
Catchpoint 1 (exception caught), 0x0000aaaaab04e8b8 in __cxa_begin_catch ()
(gdb) bt
#0 0x0000aaaaab04e8b8 in __cxa_begin_catch ()
#1 0x0000aaaaaae10694 in sanity_test_range_fmt () at compat/glibcxx_sanity.cpp:51
#2 0x0000aaaaaae108c8 in glibcxx_sanity_test () at compat/glibcxx_sanity.cpp:60
#3 0x0000aaaaaab0e06c in InitSanityCheck () at init.cpp:728
#4 AppInitSanityChecks () at init.cpp:1172
#5 0x0000aaaaaab01310 in AppInit (argc=-1421114336, argv=0xfffffffff518) at tapyrusd.cpp:136
#6 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff378) at tapyrusd.cpp:196
(gdb) continue
Continuing.
Breakpoint 4, AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
170 in tapyrusd.cpp
(gdb) bt
#0 AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#1 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) step
AppInitMain () at init.cpp:1197
1197 init.cpp: No such file or directory.
(gdb) bt
#0 AppInitMain () at init.cpp:1197
#1 0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1194 in init.cpp
(gdb) bt
#0 AppInitMain () at init.cpp:1194
#1 0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1197 in init.cpp
(gdb) bt
#0 AppInitMain () at init.cpp:1197
#1 0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2 0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1194 in init.cpp
(gdb) bt
#0 AppInitMain () at init.cpp:1194
#1 0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2 0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1197 in init.cpp
(gdb) bt
#0 AppInitMain () at init.cpp:1197
#1 0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2 0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1194 in init.cpp
(gdb) bt
#0 AppInitMain () at init.cpp:1194
#1 0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2 0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1197 in init.cpp
(gdb) bt
#0 AppInitMain () at init.cpp:1197
#1 0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2 0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3 0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1199 in init.cpp
(gdb) bt
../../gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion `frame_id_p (*this_id)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n
This is a bug, please report it. For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.
../../gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion `frame_id_p (*this_id)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) y
Python Exception <type 'exceptions.KeyboardInterrupt'> Command aborted.:
#0 AppInitMain () at init.cpp:1199
(gdb) q
A debugging session is active.
Inferior 1 [process 23408] will be killed.
Quit anyway? (y or n) y
here the line causing error is: init.cpp:1197 i.e call to 'CreatePidFile(GetPidFile(), getpid());'
void CreatePidFile(const fs::path &path, pid_t pid)
{
FILE* file = fsbridge::fopen(path, "w");
if (file)
{
fprintf(file, "%d\n", pid);
fclose(file);
}
}
i.e. the very first interaction with the file system causes stack duplication. But the arguments are messed up. I think this duplication causes tapyrusd to crash due to stack overflow. When tapyrusd is started with "-reindex", due to high file system usage the overflow happens early. When it is started without reindex too it is stopping due to stack overflow, but after a longer duration. This explains why we see v0.5.2 nodes stopping randomly
In conclusion, this could be a bug in stdlibc++ in aarch64. I'll try to more research in this direction.
Above call stack is not correct. It might be because of mismatch between symbol file version and binary version.
I am able to re create the crash on Mac OS using tapyrus-api-mainnet data directory.
the call stack was :
std::__1::shared_ptr<CTransaction const>::get() const (/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__memory/shared_ptr.h:580)
CBlock::GetHeight() const (/Users/navia/Documents/Projects/Tapyrus/tapyrus-core-nb/v0.5.2/src/primitives/block.cpp:84)
LoadExternalBlockFile(__sFILE*, CDiskBlockPos*, std::__1::vector<XFieldAggpubkey, std::__1::allocator<XFieldAggpubkey>>*) (/Users/navia/Documents/Projects/Tapyrus/tapyrus-core-nb/v0.5.2/src/validation.cpp:4290)
ThreadImport(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool) (/Users/navia/Documents/Projects/Tapyrus/tapyrus-core-nb/v0.5.2/src/init.cpp:658)
void boost::_bi::list2<boost::_bi::value<std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>>, boost::_bi::value<bool>>::operator()<void (*)(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool), boost::_bi::list0&, int) (/usr/local/include/boost/bind/bind.hpp:298)
boost::_bi::bind_t<void, void (*)(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool), boost::_bi::list2<boost::_bi::value<std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>>, boost::_bi::value<bool>>>::operator()() (/usr/local/include/boost/bind/bind.hpp:1273)
boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool), boost::_bi::list2<boost::_bi::value<std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>>, boost::_bi::value<bool>>>>::run() (/usr/local/include/boost/thread/detail/thread.hpp:120)
boost::(anonymous namespace)::thread_proxy(void*) (@boost::(anonymous namespace)::thread_proxy(void*):45)
_pthread_start (@_pthread_start:32)
thread_start (@thread_start:8)
This matches the call stack in issue 254. The fix in PR #195 should fix this.
segmentation fault in tapyrusd v0.5.2 in testnet faucet
Upon loading the symbols the stack can be obtained:
But this stack is corrupt. Some of the pointers in the stack like xFieldList are null. but this should always be initialised as it is a stack variable.
Upon backtracking the stack corruption it seems to start very early:
This makes me suspect a dependant library mismatch. It even causes gdb failure: