chaintope / tapyrus-core

Tapyrus Core
MIT License
46 stars 17 forks source link

tapyrusd crash in testate faucet #294

Closed Naviabheeman closed 6 months ago

Naviabheeman commented 9 months ago

segmentation fault in tapyrusd v0.5.2 in testnet faucet

2024-02-13T08:06:24Z [default wallet] nFileVersion = 50200
2024-02-13T08:06:24Z [default wallet] Keys: 2043 plaintext, 0 encrypted, 2043 w/ metadata, 2043 total. Unknown wallet records: 0
2024-02-13T08:06:24Z [default wallet] Wallet completed loading in              33ms
2024-02-13T08:06:24Z [default wallet] setKeyPool.size() = 2000
2024-02-13T08:06:24Z [default wallet] mapWallet.size() = 44
2024-02-13T08:06:24Z [default wallet] mapAddressBook.size() = 1
2024-02-13T08:06:24Z Reindexing block file blk00000.dat...
Segmentation fault

Upon loading the symbols the stack can be obtained:

Thread 22 "bitcoin-loadblk" received signal SIGSEGV, Segmentation fault.
CBlockIndex::GetMedianTimePast (this=<optimized out>) at ./chain.h:322
322     ./chain.h: No such file or directory.
(gdb) bt
#0  CBlockIndex::GetMedianTimePast (this=<optimized out>) at ./chain.h:322
#1  CBlockIndex::GetMedianTimePast() const () at rpc/blockchain.cpp:73
#2  0x0000aaaaaac6d904 in ContextualCheckBlock (block=..., state=..., pindexPrev=<optimized out>) at validation.cpp:3187
#3  0x0000aaaaaac78cac in CChainState::AcceptBlock (this=0xaaaaab396e78 <g_chainstate>, pblock=..., state=..., ppindex=<optimized out>,
    fRequested=<optimized out>, dbp=0xaaaaaae53c00 <CSHA256::Finalize(unsigned char*)+112>, fNewBlock=0x0, aggPubkeys=0xffffd7ffded0) at validation.cpp:3370
#4  0x0000aaaaaac874a8 in LoadExternalBlockFile (fileIn=<optimized out>, dbp=0x0, xFieldList=0x0) at validation.cpp:4250
#5  0x0000aaaaaac874a8 in LoadExternalBlockFile (fileIn=<optimized out>, dbp=0x0, xFieldList=0x0) at validation.cpp:4250
#6  0x0000aaaaaac874a8 in LoadExternalBlockFile (fileIn=<optimized out>, dbp=0x0, xFieldList=0x0) at validation.cpp:4250
#7  0x0000aaaaaac874a8 in LoadExternalBlockFile (fileIn=<optimized out>, dbp=0x0, xFieldList=0x0) at validation.cpp:4250
#8  0x0000aaaaaac874a8 in LoadExternalBlockFile (fileIn=<optimized out>, dbp=0x0, xFieldList=0x0) at validation.cpp:4250
#9  0x0000aaaaaac874a8 in LoadExternalBlockFile (fileIn=<optimized out>, dbp=0xffffd80007d0, xFieldList=0xffffd80007e0) at validation.cpp:4250
#10 0x0000aaaaaac874a8 in LoadExternalBlockFile (fileIn=<optimized out>, dbp=0x0, xFieldList=0xfffff7e8e238) at validation.cpp:4250
#11 0x0000aaaaaac874a8 in LoadExternalBlockFile (fileIn=<optimized out>, dbp=0x4f6d72379e2f0f15, xFieldList=0xf49f16b7bcce676b) at validation.cpp:4250
#1

But this stack is corrupt. Some of the pointers in the stack like xFieldList are null. but this should always be initialised as it is a stack variable.

Upon backtracking the stack corruption it seems to start very early:

70     in tapyrusd.cpp
(gdb) bt
#0  AppInit (argc=5, argv=0xfffffffff508) at tapyrusd.cpp:170
#1  0x0000aaaaaaaf19d0 in main (argc=5, argv=0xfffffffff378) at tapyrusd.cpp:196
(gdb) next

Breakpoint 9, AppInitMain () at init.cpp:1211
1211    init.cpp: No such file or directory.
(gdb) bt
#0  AppInitMain () at init.cpp:1211
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

This makes me suspect a dependant library mismatch. It even causes gdb failure:



Thread 1 "tapyrusd" hit Breakpoint 6, AppInitMain () at init.cpp:1250
1250    in init.cpp
(gdb) bt
../../gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion `frame_id_p (*this_id)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.

../../gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion `frame_id_p (*this_id)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n```
Naviabheeman commented 9 months ago

I'll check the versions of dependencies especially boost

Naviabheeman commented 9 months ago

issue happens often on aarch64 was hosts but not on local x86-64.

This is the session where the very first duplication in the stack can be seen:

gdb) r
Starting program: /tmp/data/tapyrusd --datadir=/tmp/data --conf=/tmp/data/tapyrus.conf --reindex
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.

Breakpoint 3, AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:124
124     tapyrusd.cpp: No such file or directory.
(gdb) bt
#0  AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:124
#1  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff378) at tapyrusd.cpp:196
(gdb) continue
Continuing.
2024-02-19T18:07:11Z Tapyrus Core version v0.5.2.0-c72609f23 (release build)
2024-02-19T18:07:11Z InitParameterInteraction: parameter interaction: -whitelistforcerelay=1 -> setting -whitelistrelay=1
2024-02-19T18:07:11Z Validating signatures for all blocks.
2024-02-19T18:07:11Z Using the 'standard' SHA256 implementation

Catchpoint 1 (exception caught), 0x0000aaaaab04e8b8 in __cxa_begin_catch ()
(gdb) bt
#0  0x0000aaaaab04e8b8 in __cxa_begin_catch ()
#1  0x0000aaaaaae10694 in sanity_test_range_fmt () at compat/glibcxx_sanity.cpp:51
#2  0x0000aaaaaae108c8 in glibcxx_sanity_test () at compat/glibcxx_sanity.cpp:60
#3  0x0000aaaaaab0e06c in InitSanityCheck () at init.cpp:728
#4  AppInitSanityChecks () at init.cpp:1172
#5  0x0000aaaaaab01310 in AppInit (argc=-1421114336, argv=0xfffffffff518) at tapyrusd.cpp:136
#6  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff378) at tapyrusd.cpp:196
(gdb) continue
Continuing.

Breakpoint 4, AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
170     in tapyrusd.cpp
(gdb) bt
#0  AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#1  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) step
AppInitMain () at init.cpp:1197
1197    init.cpp: No such file or directory.
(gdb) bt
#0  AppInitMain () at init.cpp:1197
#1  0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1194    in init.cpp
(gdb) bt
#0  AppInitMain () at init.cpp:1194
#1  0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1197    in init.cpp
(gdb) bt
#0  AppInitMain () at init.cpp:1197
#1  0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2  0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1194    in init.cpp
(gdb) bt
#0  AppInitMain () at init.cpp:1194
#1  0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2  0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1197    in init.cpp
(gdb) bt
#0  AppInitMain () at init.cpp:1197
#1  0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2  0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1194    in init.cpp
(gdb) bt
#0  AppInitMain () at init.cpp:1194
#1  0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2  0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1197    in init.cpp
(gdb) bt
#0  AppInitMain () at init.cpp:1197
#1  0x0000aaaaaab01394 in AppInit (argc=4, argv=0xfffffffff518) at tapyrusd.cpp:170
#2  0x0000aaaaaab01394 in AppInit (argc=0, argv=0xfffffffff518) at tapyrusd.cpp:170
#3  0x0000aaaaaaaf19d0 in main (argc=4, argv=0xfffffffff388) at tapyrusd.cpp:196
(gdb) next
1199    in init.cpp
(gdb) bt
../../gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion `frame_id_p (*this_id)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.

../../gdb/inline-frame.c:167: internal-error: void inline_frame_this_id(frame_info*, void**, frame_id*): Assertion `frame_id_p (*this_id)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) y
Python Exception <type 'exceptions.KeyboardInterrupt'> Command aborted.:
#0  AppInitMain () at init.cpp:1199
(gdb) q
A debugging session is active.

        Inferior 1 [process 23408] will be killed.

Quit anyway? (y or n) y

here the line causing error is: init.cpp:1197 i.e call to 'CreatePidFile(GetPidFile(), getpid());'

void CreatePidFile(const fs::path &path, pid_t pid)
{
    FILE* file = fsbridge::fopen(path, "w");
    if (file)
    {
        fprintf(file, "%d\n", pid);
        fclose(file);
    }
}

i.e. the very first interaction with the file system causes stack duplication. But the arguments are messed up. I think this duplication causes tapyrusd to crash due to stack overflow. When tapyrusd is started with "-reindex", due to high file system usage the overflow happens early. When it is started without reindex too it is stopping due to stack overflow, but after a longer duration. This explains why we see v0.5.2 nodes stopping randomly

In conclusion, this could be a bug in stdlibc++ in aarch64. I'll try to more research in this direction.

Naviabheeman commented 9 months ago

Above call stack is not correct. It might be because of mismatch between symbol file version and binary version.

I am able to re create the crash on Mac OS using tapyrus-api-mainnet data directory.

the call stack was :

std::__1::shared_ptr<CTransaction const>::get() const (/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__memory/shared_ptr.h:580)
CBlock::GetHeight() const (/Users/navia/Documents/Projects/Tapyrus/tapyrus-core-nb/v0.5.2/src/primitives/block.cpp:84)
LoadExternalBlockFile(__sFILE*, CDiskBlockPos*, std::__1::vector<XFieldAggpubkey, std::__1::allocator<XFieldAggpubkey>>*) (/Users/navia/Documents/Projects/Tapyrus/tapyrus-core-nb/v0.5.2/src/validation.cpp:4290)
ThreadImport(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool) (/Users/navia/Documents/Projects/Tapyrus/tapyrus-core-nb/v0.5.2/src/init.cpp:658)
void boost::_bi::list2<boost::_bi::value<std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>>, boost::_bi::value<bool>>::operator()<void (*)(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool), boost::_bi::list0&, int) (/usr/local/include/boost/bind/bind.hpp:298)
boost::_bi::bind_t<void, void (*)(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool), boost::_bi::list2<boost::_bi::value<std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>>, boost::_bi::value<bool>>>::operator()() (/usr/local/include/boost/bind/bind.hpp:1273)
boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>, bool), boost::_bi::list2<boost::_bi::value<std::__1::vector<boost::filesystem::path, std::__1::allocator<boost::filesystem::path>>>, boost::_bi::value<bool>>>>::run() (/usr/local/include/boost/thread/detail/thread.hpp:120)
boost::(anonymous namespace)::thread_proxy(void*) (@boost::(anonymous namespace)::thread_proxy(void*):45)
_pthread_start (@_pthread_start:32)
thread_start (@thread_start:8)

This matches the call stack in issue 254. The fix in PR #195 should fix this.