cculianu / Fulcrum

A fast & nimble SPV Server for BCH, BTC, and LTC
Other
325 stars 73 forks source link

Fulcrum 1.9.5 terminates with uncaught exception #214

Closed Francisco-DAnconia closed 7 months ago

Francisco-DAnconia commented 8 months ago

I just updated to 1.9.5 and was running Fulcrum on BTC Testnet. After a "Failed to Find Previous Transaction" (same as issue #141??, which I still get a lot of on an older version of Fulcrum) an exception propagates to Qt which terminates Fulcrum.

Monterey 12.7, built with ZeroMQ, no jemalloc.

Terminal Output:

[2023-11-15 10:32:04.876] <Controller> 155 mempool txs involving 95 addresses
[2023-11-15 10:33:04.875] <Controller> 175 mempool txs involving 124 addresses
[2023-11-15 10:33:18.914] <SynchMempool> Synch mempool expected to drop 173, but in fact dropped 176 -- retrying getrawmempool
[2023-11-15 10:33:18.919] <SyncMempoolPreCache> threadFunc: Unable to find prevout 016fb7d6eb6884ba4c7e536ccbdd0b9558ebc01c32342b3e8eb49a561bc73b6f:1 in DB for tx bddb0bf8513cc6cfd454638b89460f9e701625ccee097bd957f33d834b5c0014 (possibly a block arrived while synching mempool, will retry)
[2023-11-15 10:33:18.921] <SynchMempool> Caught exception when processing mempool tx's: FAILED TO FIND PREVIOUS TX 269a0a7d5662a12c7ab14000df2b0ba5b978ab546bc3d8986791d0283f1c1184:1 IN EITHER MEMPOOL OR DB for TxHash: 59d9e9472792b2da078ac85584b43b14e9d4145ef9ad39d4c708e0849f5bc06c (input 0)
[2023-11-15 10:33:18.922] <Controller> [Qt Warning] Qt has caught an exception thrown from an event handler. Throwing
exceptions from an event handler is not supported in Qt.
You must not let any exception whatsoever propagate through Qt code.
If that is not possible, in Qt 5 you must at least reimplement
QCoreApplication::notify() and catch all exceptions there.
 (:0, )
libc++abi: terminating with uncaught exception of type std::__1::system_error: thread::join failed: No such process
cculianu commented 8 months ago

Hmm. This should never happen, yet I believe you that it did. I will investigate. Thanks so much for the bug report.

Question: Do you think it was some 1-off or does it happen with any regularity on testnet? Do you have any more of the log preceding this ? Did a block indeed arrive at that time ?

Anyway this requires more investigation from me. Thanks for the report.

cculianu commented 8 months ago

Ok, I managed to reproduce the error. Still not sure why inputs were missing on your testnet but this exposed a bug in the synch code (a race condition). I have a fix. I will push it. Are you getting this error often? Would you be able to try my fix on master?

Francisco-DAnconia commented 8 months ago

Good news. I only recently updated to 1.9.5 (from 1.7.0) and was testing it out on Testnet before migrating it to Mainnet. It ran for about a 5 days before the error plus I didn't notice for a few days. Since I restarted (1 day) - no subsequent errors.

Would you be able to try my fix on master?

I'm sorry I don't understand what your asking, but am happy to help? What does master refer to?

cculianu commented 8 months ago

It ran for about a 5 days before the error plus I didn't notice for a few days. Since I restarted (1 day) - no subsequent errors.

Well it's good to know it's rare but it definitely was a bug and I'm super-glad you contacted me! This was def. a race condition. Darn. I may need to do a release very soon.

I'm sorry I don't understand what your asking, but am happy to help? What does master refer to?

Oh.. I meant like can you build from the current latest github code at this repo. master refers to the latest commit on this repo.

If you like I can test this here overnight on BTC mainnet (which is very active and rapes mempool synch code typically).. and do a release tomorrow.

Or.. if you know how to build (I presume you do since you are on mac??) -- maybe you can pull the latest code from this repo and build and test it too?

Francisco-DAnconia commented 8 months ago

IC, thanks for the explanation.

Sure, I can definitely do this on Testnet, and if I have time I'll see if I can back up the Fulcrum data on mainnet and run the "head/master" there as well.

Thanks for the quick investigation. Bugs that should never happen are always the most interesting...in the end.

cculianu commented 8 months ago

Sure, I can definitely do this on Testnet

Yeah this particular bug is more likely to happen on testnet -- it requires a "block only" txn to arrive in a block, and then a mempool txn to reference it, just at the precise time. On mainnet, "block-only" txns (txns that were never in mempool) are very rare...

At least I think that's what happened.

Anyway yeah testing on both would be appreciated if you have time!

I'm already testing here on mainnet... (although I never managed to see this error since it's extremely rare and unlikely to happen..). You got lucky!

cculianu commented 8 months ago

I think I will do a release ASAP tbh. It bugs me that this bug can happen, and the fixed commit is at least as good as or better than what is currently released. Closing this issue for now.

hMsats commented 8 months ago

While compiling 1.9.7 on Ubuntu 20.04.6 LTS as follows:

make clean
qmake LIBS+=-ljemalloc
make -j4

I get:

-DUSE_QT_IN_BITCOIN -DHAVE_ENDIAN_H -DHAVE_DECL_HTOBE16 -DHAVE_DECL_HTOLE16 -DHAVE_DECL_BE16TOH -DHAVE_DECL_LE16TOH -DHAVE_DECL_HTOBE32 -DHAVE_DECL_HTOLE32 -DHAVE_DECL_BE32TOH -DHAVE_DECL_LE32TOH -DHAVE_DECL_HTOBE64 -DHAVE_DECL_HTOLE64 -DHAVE_DECL_BE64TOH -DHAVE_DECL_LE64TOH -DHAVE_DECL___BUILTIN_CLZL -DHAVE_DECL___BUILTIN_CLZLL -DGIT_COMMIT=\"f27fc28\" -DENABLE_ZMQ -DHAVE_JEMALLOC_HEADERS -DQT_NO_DEBUG -DQT_NETWORK_LIB -DQT_CORE_LIB -I. -Istaticlibs/rocksdb/include -Isrc -isystem /usr/include/pgm-5.2 -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtNetwork -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o SrvMgr.o src/SrvMgr.cpp
g++ -c -pipe -std=c++1z -DNDEBUG -O3 -std=gnu++1z -isystem /usr/include/mit-krb5 -w -D_REENTRANT -fPIC -DQT_DEPRECATED_WARNINGS -DUSE_QT_IN_BITCOIN -DHAVE_ENDIAN_H -DHAVE_DECL_HTOBE16 -DHAVE_DECL_HTOLE16 -DHAVE_DECL_BE16TOH -DHAVE_DECL_LE16TOH -DHAVE_DECL_HTOBE32 -DHAVE_DECL_HTOLE32 -DHAVE_DECL_BE32TOH -DHAVE_DECL_LE32TOH -DHAVE_DECL_HTOBE64 -DHAVE_DECL_HTOLE64 -DHAVE_DECL_BE64TOH -DHAVE_DECL_LE64TOH -DHAVE_DECL___BUILTIN_CLZL -DHAVE_DECL___BUILTIN_CLZLL -DGIT_COMMIT=\"f27fc28\" -DENABLE_ZMQ -DHAVE_JEMALLOC_HEADERS -DQT_NO_DEBUG -DQT_NETWORK_LIB -DQT_CORE_LIB -I. -Istaticlibs/rocksdb/include -Isrc -isystem /usr/include/pgm-5.2 -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtNetwork -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o Storage.o src/Storage.cpp
g++ -c -pipe -std=c++1z -DNDEBUG -O3 -std=gnu++1z -isystem /usr/include/mit-krb5 -w -D_REENTRANT -fPIC -DQT_DEPRECATED_WARNINGS -DUSE_QT_IN_BITCOIN -DHAVE_ENDIAN_H -DHAVE_DECL_HTOBE16 -DHAVE_DECL_HTOLE16 -DHAVE_DECL_BE16TOH -DHAVE_DECL_LE16TOH -DHAVE_DECL_HTOBE32 -DHAVE_DECL_HTOLE32 -DHAVE_DECL_BE32TOH -DHAVE_DECL_LE32TOH -DHAVE_DECL_HTOBE64 -DHAVE_DECL_HTOLE64 -DHAVE_DECL_BE64TOH -DHAVE_DECL_LE64TOH -DHAVE_DECL___BUILTIN_CLZL -DHAVE_DECL___BUILTIN_CLZLL -DGIT_COMMIT=\"f27fc28\" -DENABLE_ZMQ -DHAVE_JEMALLOC_HEADERS -DQT_NO_DEBUG -DQT_NETWORK_LIB -DQT_CORE_LIB -I. -Istaticlibs/rocksdb/include -Isrc -isystem /usr/include/pgm-5.2 -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtNetwork -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o SSLCertMonitor.o src/SSLCertMonitor.cpp
src/Storage.cpp: In member function ‘std::optional<Storage::FirstUse> Storage::getFirstUse(const HashX&) const’:
src/Storage.cpp:3903:88: error: call of overloaded ‘arg(QByteArray, QByteArray)’ is ambiguous
 3903 |                                                      .arg(hashX.toHex(), optba->toHex()));
      |                                                                                        ^
In file included from /usr/include/x86_64-linux-gnu/qt5/QtCore/QString:1,
                 from src/ByteView.h:23,
                 from src/Util.h:21,
                 from src/Mixins.h:21,
                 from src/App.h:21,
                 from src/Storage.cpp:19:
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:275:31: note: candidate: ‘QString QString::arg(qlonglong, int, int, QChar) const’
  275 |     Q_REQUIRED_RESULT QString arg(qlonglong a, int fieldwidth=0, int base=10,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:277:31: note: candidate: ‘QString QString::arg(qulonglong, int, int, QChar) const’
  277 |     Q_REQUIRED_RESULT QString arg(qulonglong a, int fieldwidth=0, int base=10,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:979:16: note: candidate: ‘QString QString::arg(long int, int, int, QChar) const’
  979 | inline QString QString::arg(long a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:981:16: note: candidate: ‘QString QString::arg(ulong, int, int, QChar) const’
  981 | inline QString QString::arg(ulong a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:975:16: note: candidate: ‘QString QString::arg(int, int, int, QChar) const’
  975 | inline QString QString::arg(int a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:977:16: note: candidate: ‘QString QString::arg(uint, int, int, QChar) const’
  977 | inline QString QString::arg(uint a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:983:16: note: candidate: ‘QString QString::arg(short int, int, int, QChar) const’
  983 | inline QString QString::arg(short a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:985:16: note: candidate: ‘QString QString::arg(ushort, int, int, QChar) const’
  985 | inline QString QString::arg(ushort a, int fieldWidth, int base, QChar fillChar) const
      |                ^~~~~~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:291:31: note: candidate: ‘QString QString::arg(double, int, char, int, QChar) const’
  291 |     Q_REQUIRED_RESULT QString arg(double a, int fieldWidth = 0, char fmt = 'g', int prec = -1,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:293:31: note: candidate: ‘QString QString::arg(char, int, QChar) const’
  293 |     Q_REQUIRED_RESULT QString arg(char a, int fieldWidth = 0,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:298:31: note: candidate: ‘QString QString::arg(const QString&, int, QChar) const’
  298 |     Q_REQUIRED_RESULT QString arg(const QString &a, int fieldWidth = 0,
      |                               ^~~
/usr/include/x86_64-linux-gnu/qt5/QtCore/qstring.h:987:16: note: candidate: ‘QString QString::arg(const QString&, const QString&) const’
  987 | inline QString QString::arg(const QString &a1, const QString &a2) const
      |                ^~~~~~~
g++ -c -pipe -std=c++1z -DNDEBUG -O3 -std=gnu++1z -isystem /usr/include/mit-krb5 -w -D_REENTRANT -fPIC -DQT_DEPRECATED_WARNINGS -DUSE_QT_IN_BITCOIN -DHAVE_ENDIAN_H -DHAVE_DECL_HTOBE16 -DHAVE_DECL_HTOLE16 -DHAVE_DECL_BE16TOH -DHAVE_DECL_LE16TOH -DHAVE_DECL_HTOBE32 -DHAVE_DECL_HTOLE32 -DHAVE_DECL_BE32TOH -DHAVE_DECL_LE32TOH -DHAVE_DECL_HTOBE64 -DHAVE_DECL_HTOLE64 -DHAVE_DECL_BE64TOH -DHAVE_DECL_LE64TOH -DHAVE_DECL___BUILTIN_CLZL -DHAVE_DECL___BUILTIN_CLZLL -DGIT_COMMIT=\"f27fc28\" -DENABLE_ZMQ -DHAVE_JEMALLOC_HEADERS -DQT_NO_DEBUG -DQT_NETWORK_LIB -DQT_CORE_LIB -I. -Istaticlibs/rocksdb/include -Isrc -isystem /usr/include/pgm-5.2 -isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtNetwork -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -I. -I/usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++ -o SubsMgr.o src/SubsMgr.cpp
cculianu commented 8 months ago

Grr why do compilers hate me so. I pushed a commit to address this: https://github.com/cculianu/Fulcrum/commit/6d4026a4e578423017dd86a25c7635d6e722a29b

hMsats commented 8 months ago

That did it! Thanks a lot, it's running.

Francisco-DAnconia commented 8 months ago

Update on 1.9.6+. Testnet has been running for 2 days w/o issue - no asserts.

I have not seen the asserts metntioned in #141 now that I updated Mainnet. But I do see these variants of Tx dropped out of the mempool.

  1. Single assert (yellow):
    <Controller> Processed 1 new block with 4042 txs (6896 inputs, 8813 outputs, 7873 addresses), verified ok.
    <Controller> Block height 818273, up-to-date
    <SynchMempool> Tx dropped out of mempool (possibly due to RBF): 46fd88359f8a91441fb46a813d97e89c1c4e0196775e23af1adaf473f8d1e3c5 (error response: No such mempool or blockchain transaction. Use gettransaction for wallet transactions.), ignoring mempool tx ...
    <Controller> 103155 mempool txs involving 502600 addresses
    <Controller> 103731 mempool txs involving 504035 addresses
    <Controller> 104345 mempool txs involving 505852 addresses
  2. Triple assert (yellow/red):
    <Controller> 106195 mempool txs involving 511583 addresses
    <Controller> 106551 mempool txs involving 512882 addresses
    <SynchMempool> Synch mempool expected to drop 3217, but in fact dropped 3327 -- retrying getrawmempool
    <SyncMempoolPreCache> SynchMempoolTask::Precache::threadFunc: Unable to find prevout 832ac0b71d5fe00baeec6129298e0e55ee73382f327ca292187d3e8a9fa2e286:12 in DB for tx 5b73b9fac71356db34a89c26be19bd3d906c304f7625300dbb3d3b594aebdf90 (possibly a block arrived while synching mempool, will retry)
    <Controller> Failed to synch blocks and/or mempool
    <Controller> Block height 818274, downloading new blocks ...
    <Controller> Processed 1 new block with 3218 txs (7344 inputs, 9206 outputs, 11379 addresses), verified ok.
    <Controller> Block height 818274, up-to-date
  3. Quad assert (yellow/red):
    <Controller> Block height 818275, up-to-date
    <Controller> 105716 mempool txs involving 507956 addresses
    <Controller> 106129 mempool txs involving 509332 addresses
    <SynchMempool> Synch mempool expected to drop 3285, but in fact dropped 3320 -- retrying getrawmempool
    <SyncMempoolPreCache> SynchMempoolTask::Precache::threadFunc: Unable to find prevout 899b2e34b038746c50b01aaaa49a00152ffb3be4d80898fc01f71b5b47ab7340:0 in DB for tx 3c4768ac8131020b1964f43621cd6d59bc5ecef788efe2dfebf3946bd249d449 (possibly a block arrived while synching mempool, will retry)
    <SynchMempool> processResults: precache->thread errored out, aborting SynchMempoolTask
    <Controller> Failed to synch blocks and/or mempool
    <Controller> Block height 818276, downloading new blocks ...
    <Controller> Processed 1 new block with 3287 txs (6655 inputs, 7832 outputs, 9150 addresses), verified ok.
    <Controller> Block height 818276, up-to-date

Are these normal and expected given the state of the mempool, that is, benign. Or something else?

Thanks

cculianu commented 7 months ago

The first error (1) above is expected to happen occasionally. It's just because it's impossible to synch the mempool 100% correctly in an atomic fashion since it's fast changing on BTC when mempool is full. Sometimes txns drop out that were there a moment ago and sometimes live txns that you just were told were there, refer to other txns that are no longer there. So that error can happen when a txn gets expired or RBF'd and it has children you are busy examining. I see error (1) too on my setup here every once in a while and it's perfectly normal.

The other 2 errors are rarer .. i have never seen them happen .. the fact that they happened for you successively makes me curious as to WHY they happened.

In all cases the errors are recoverable and Fulcrum eventually settles. I will have to examine the other 2 errors though in more detail.. because something is fishy there.

I'm going to re-open this issue and re-assign it again. I hope to get time this weekend or next week to work on this.

cculianu commented 7 months ago

I'm actually going to close this issue and i opened #216 that references your comment.