cculianu / Fulcrum

A fast & nimble SPV Server for BCH, BTC, and LTC
Other
331 stars 76 forks source link

Dropped connection to bitcoind #160

Closed RequestPrivacy closed 1 year ago

RequestPrivacy commented 1 year ago

Note - investigation revealed this issue isn't related to fulcrum

Description

I run a bitcoind (Bitcoin Core v22.0.0) and fulcrum (1.7.0) on the same Pi 4 and use cookie authentication.

To keep permission to an absolute minimum I allow read access to the .cookie file to a dedicated btcCookie group in which the fulcrum user is a member:

pi@debian:  ls -la | grep .cookie
-rw-r----- 1 bitcoin btcCookie        75 Mar 15 16:31 .cookie

pi@debian: groups fulcrum
fulcrum : fulcrum btcCookie

To achieve the above group setting I use -startupnotify in my bitcoind.service file

[Service]
ExecStart=/usr/local/bin/bitcoind -daemon \
                            -pid=/home/bitcoin/.bitcoin/bitcoind.pid \
                            -conf=/home//bitcoin/.bitcoin/bitcoin.conf \
                            -datadir=/home/bitcoin/.bitcoin \
                            -startupnotify="chown bitcoin:btcCookie /home/bitcoin/.bitcoin/.cookie && chmod 640 /home/bitcoin/.bitcoin/.cookie"

During start-up fulcrum has no problem to connect to bitcoind.

Problem

Sometimes fulcrum looses connection to bitcoind. I'm not yet sure why.

pi@debian:/mnt/usb/bitcoin/core $ sudo journalctl -fu fulcrum --since="2023-03-15 16:00:00"
[sudo] password for pi: 
-- Journal begins at <date> --

# Fulcrum running smoothly
Mar 15 16:00:11 debian Fulcrum[1554690]: [2023-03-15 16:00:11.877] <Controller> 18869 mempool txs involving 207868 addresses
Mar 15 16:00:28 debian Fulcrum[1554690]: [2023-03-15 16:00:28.598] <Controller> Block height 780926, downloading new blocks ...
Mar 15 16:00:29 debian Fulcrum[1554690]: [2023-03-15 16:00:29.913] <Controller> Processed 1 new block with 3531 txs (6720 inputs, 10890 outputs, 13619 addresses), verified ok.
Mar 15 16:00:29 debian Fulcrum[1554690]: [2023-03-15 16:00:29.918] <Controller> Block height 780926, up-to-date
Mar 15 16:01:19 debian Fulcrum[1554690]: [2023-03-15 16:01:19.900] <Controller> 15729 mempool txs involving 196741 addresses
Mar 15 16:02:29 debian Fulcrum[1554690]: [2023-03-15 16:02:29.881] <Controller> 16081 mempool txs involving 198076 addresses
Mar 15 16:03:29 debian Fulcrum[1554690]: [2023-03-15 16:03:29.904] <Controller> 16399 mempool txs involving 199118 addresses
Mar 15 16:04:29 debian Fulcrum[1554690]: [2023-03-15 16:04:29.915] <Controller> 16693 mempool txs involving 199949 addresses
Mar 15 16:05:39 debian Fulcrum[1554690]: [2023-03-15 16:05:39.901] <Controller> 17031 mempool txs involving 201074 addresses
Mar 15 16:06:49 debian Fulcrum[1554690]: [2023-03-15 16:06:49.899] <Controller> 17361 mempool txs involving 203962 addresses
Mar 15 16:07:49 debian Fulcrum[1554690]: [2023-03-15 16:07:49.899] <Controller> 17623 mempool txs involving 204890 addresses
Mar 15 16:08:50 debian Fulcrum[1554690]: [2023-03-15 16:08:49.986] <Controller> 17817 mempool txs involving 205568 addresses
Mar 15 16:09:50 debian Fulcrum[1554690]: [2023-03-15 16:09:50.078] <Controller> 17904 mempool txs involving 205818 addresses
Mar 15 16:10:41 debian Fulcrum[1554690]: [2023-03-15 16:10:40.944] <SynchMempool> Tx dropped out of mempool (possibly due to RBF): 139e0b203cd9ae4c9edac13163c4498e6792978222889d666c04a23ebd33171a (error response: No such mempool or blockchain transaction. Use gettransaction for wallet transactions.), ignoring mempool tx ...
Mar 15 16:10:41 debian Fulcrum[1554690]: [2023-03-15 16:10:41.717] <SynchMempool> Tx dropped out of mempool (possibly due to RBF): 42456566c08084d1770777a747fe07e3eb0af474beab9fef2b65c972660655d6 (error response: No such mempool or blockchain transaction. Use gettransaction for wallet transactions.), ignoring mempool tx ...
Mar 15 16:10:43 debian Fulcrum[1554690]: [2023-03-15 16:10:43.154] <SynchMempool> Tx dropped out of mempool (possibly due to RBF): b1bf88c34f6bb2660d6b99a34d8fea13d3985bdd8772a306b2867e63cc7d68a4 (error response: No such mempool or blockchain transaction. Use gettransaction for wallet transactions.), ignoring mempool tx ...
Mar 15 16:10:44 debian Fulcrum[1554690]: [2023-03-15 16:10:44.289] <SynchMempool> Tx dropped out of mempool (possibly due to RBF): 58e97794ef1e2cc8d8f7923ea21eeb9a6f06d0b26bc207ad21d0ca04a11d09b5 (error response: No such mempool or blockchain transaction. Use gettransaction for wallet transactions.), ignoring mempool tx ...
Mar 15 16:10:59 debian Fulcrum[1554690]: [2023-03-15 16:10:59.882] <Controller> 18468 mempool txs involving 208255 addresses
Mar 15 16:11:59 debian Fulcrum[1554690]: [2023-03-15 16:11:59.883] <Controller> 18727 mempool txs involving 209259 addresses
Mar 15 16:13:00 debian Fulcrum[1554690]: [2023-03-15 16:12:59.959] <Controller> 18926 mempool txs involving 209990 addresses

# Fulcrum loosing the connection to bitcoind
Mar 15 16:13:21 debian Fulcrum[1554690]: [2023-03-15 16:13:20.948] <BitcoinD.1> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 15 16:13:21 debian Fulcrum[1554690]: [2023-03-15 16:13:20.948] <Task.GetChainInfo> 413045: FAIL: bitcoind connection lost
Mar 15 16:13:21 debian Fulcrum[1554690]: [2023-03-15 16:13:21.087] <Controller> Failed to synch blocks and/or mempool

# Somehow connected again?
Mar 15 16:13:28 debian Fulcrum[1554690]: [2023-03-15 16:13:27.464] <Controller> Block height 780927, downloading new blocks ...
Mar 15 16:13:31 debian Fulcrum[1554690]: [2023-03-15 16:13:31.139] <Controller> Processed 1 new block with 3344 txs (6646 inputs, 9964 outputs, 13459 addresses), verified ok.
Mar 15 16:13:31 debian Fulcrum[1554690]: [2023-03-15 16:13:31.666] <Controller> Block height 780927, up-to-date

# Look at the - in comparison - long time between between the next and previous timestamp. No output has been truncated! The ssh shell with which I was logged in (but afk for some time) was frozen, so I needed to quit the local terminal and ssh back into the pi. After some time the pi responded and I could log back in. So maybe the pi couldn't handle the load?
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:14:11.475] <Controller> 15814 mempool txs involving 198202 addresses

# Now the connection dropped again
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:14:21.303] <BitcoinD.1> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:14:21.356] <BitcoinD.3> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:14:22.120] <BitcoinD.2> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:15:33.747] <BitcoinD.1> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:15:35.230] <SynchMempool> 413418: FAIL: Unable to find a good BitcoinD connection
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:15:32.332] <Controller> Waiting for bitcoind...
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:15:36.066] <BitcoinD.2> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:15:33.748] <BitcoinD.3> Lost connection to bitcoind, will retry every 5 seconds ...

# Potentially no access to the cookie-file might be the reason as indicated in the logs?
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:28:35.517] <BitcoinD.1> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:31:22.388] <BitcoinD.2> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:31:22.774] <BitcoinD.3> TCP BitcoinD.3 (id: 4) : error 0 (Connection refused)
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:30:50.254] <Controller> Waiting for bitcoind...
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:31:23.870] <BitcoinD.2> TCP BitcoinD.2 (id: 3) : error 0 (Connection refused)
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:31:22.775] <BitcoinD.1> TCP BitcoinD.1 (id: 2) : error 0 (Connection refused)
Mar 15 16:31:26 debian Fulcrum[1554690]: [2023-03-15 16:31:23.874] <Controller> Waiting for bitcoind...
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.641] <BitcoinD.2> Unable to open cookie file '/home/bitcoin/.bitcoin/.cookie': Permission denied
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.669] <BitcoinD.3> Unable to open cookie file '/home/bitcoin/.bitcoin/.cookie': Permission denied
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.881] <BitcoinD.1> Unable to open cookie file '/home/bitcoin/.bitcoin/.cookie': Permission denied
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.893] <BitcoinD.2> Got HTTP status 401 Unauthorized; will log the rest of this HTTP response
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.893] <BitcoinD.2> Authentication to bitcoind rpc failed. Please check the rpccookie is correct and restart!
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.893] <BitcoinD.2> 401 (header): WWW-Authenticate: Basic realm="jsonrpc"
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.893] <BitcoinD.2> 401 (header): Date: Wed, 15 Mar 2023 15:31:27 GMT
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.893] <BitcoinD.2> 401 (header): Content-Length: 0
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.893] <BitcoinD.2> 401 (header): Content-Type: text/html; charset=ISO-8859-1
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.893] <BitcoinD.2> Got unexpected content type: text/html; charset=ISO-8859-1; will log the rest of this HTTP response
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.893] <BitcoinD.2> 401 (header):
Mar 15 16:31:27 debian Fulcrum[1554690]: [2023-03-15 16:31:27.893] <BitcoinD.2> 401 (content):
# ...
# ...some more of the above error message until 
# ...

# fulcrum seems to re-connect:
Mar 15 16:34:08 debian Fulcrum[1554690]: [2023-03-15 16:34:08.061] <Controller> We have height 780927, but bitcoind reports height 780926. Possible reorg, will rewind back 1 block and try again ...
Mar 15 16:35:01 debian Fulcrum[1554690]: [2023-03-15 16:35:01.870] <Controller> Applied undo for block 780927 hash 000000000000000000004fc10fe0a646ebb03ed0066747a7fc4cf559250cd025, 3344 transactions involving 13459 scripthashes, in 53807.03 msec, new height now: 780926
Mar 15 16:35:19 debian Fulcrum[1554690]: [2023-03-15 16:35:19.568] <Controller> Block height 780928, downloading new blocks ...
Mar 15 16:35:24 debian Fulcrum[1554690]: [2023-03-15 16:35:24.978] <Controller> Processed 2 new blocks with 6251 txs (14261 inputs, 16636 outputs, 25640 addresses), verified ok.
Mar 15 16:35:43 debian Fulcrum[1554690]: [2023-03-15 16:35:43.048] <Controller> Block height 780929, downloading new blocks ...
Mar 15 16:35:47 debian Fulcrum[1554690]: [2023-03-15 16:35:47.731] <Controller> Processed 1 new block with 2831 txs (8011 inputs, 7026 outputs, 12517 addresses), verified ok.
Mar 15 16:36:04 debian Fulcrum[1554690]: [2023-03-15 16:36:04.197] <Controller> Block height 780930, downloading new blocks ...
Mar 15 16:36:07 debian Fulcrum[1554690]: [2023-03-15 16:36:07.890] <Controller> Processed 1 new block with 3476 txs (6692 inputs, 10393 outputs, 14242 addresses), verified ok.
Mar 15 16:36:22 debian Fulcrum[1554690]: [2023-03-15 16:36:22.304] <Controller> Block height 780931, downloading new blocks ...
Mar 15 16:36:24 debian Fulcrum[1554690]: [2023-03-15 16:36:24.644] <Controller> Processed 1 new block with 1291 txs (6557 inputs, 3014 outputs, 7807 addresses), verified ok.
Mar 15 16:36:24 debian Fulcrum[1554690]: [2023-03-15 16:36:24.676] <Controller> Block height 780931, up-to-date
Mar 15 16:36:27 debian Fulcrum[1554690]: [2023-03-15 16:36:27.248] <SynchMempool> Tx dropped out of mempool (possibly due to RBF): 13073ea9d6ae5097ae93dab8065e325917a127a5679ff4349df8547d08626f9b (error response: No such mempool or blockchain transaction. Use gettransaction for wallet transactions.), ignoring mempool tx ...
Mar 15 16:36:34 debian Fulcrum[1554690]: [2023-03-15 16:36:34.878] <Controller> 712 mempool txs involving 3753 addresses
Mar 15 16:37:34 debian Fulcrum[1554690]: [2023-03-15 16:37:34.881] <Controller> 1011 mempool txs involving 4793 addresses
Mar 15 16:37:46 debian Fulcrum[1554690]: [2023-03-15 16:37:46.515] <Controller> Block height 780932, downloading new blocks ...
Mar 15 16:37:48 debian Fulcrum[1554690]: [2023-03-15 16:37:48.492] <Controller> Processed 1 new block with 2556 txs (6405 inputs, 6267 outputs, 10477 addresses), verified ok.
Mar 15 16:37:48 debian Fulcrum[1554690]: [2023-03-15 16:37:48.496] <Controller> Block height 780932, up-to-date
Mar 15 16:38:38 debian Fulcrum[1554690]: [2023-03-15 16:38:38.877] <Controller> 816 mempool txs involving 2807 addresses
Mar 15 16:39:38 debian Fulcrum[1554690]: [2023-03-15 16:39:38.879] <Controller> 1238 mempool txs involving 4506 addresses
Mar 15 16:40:48 debian Fulcrum[1554690]: [2023-03-15 16:40:48.878] <Controller> 1642 mempool txs involving 6307 addresses
Mar 15 16:41:48 debian Fulcrum[1554690]: [2023-03-15 16:41:48.879] <Controller> 1917 mempool txs involving 7730 addresses
Mar 15 16:42:58 debian Fulcrum[1554690]: [2023-03-15 16:42:58.878] <Controller> 2300 mempool txs involving 9296 addresses
Mar 15 16:43:58 debian Fulcrum[1554690]: [2023-03-15 16:43:58.878] <Controller> 2567 mempool txs involving 10227 addresses
Mar 15 16:44:58 debian Fulcrum[1554690]: [2023-03-15 16:44:58.878] <Controller> 2873 mempool txs involving 11237 addresses
Mar 15 16:46:08 debian Fulcrum[1554690]: [2023-03-15 16:46:08.877] <Controller> 3248 mempool txs involving 12772 addresses
Mar 15 16:47:08 debian Fulcrum[1554690]: [2023-03-15 16:47:08.878] <Controller> 3546 mempool txs involving 13912 addresses

So my question is: a.) what could be the cause that the connection is lost in the first place? The pi is only running bitcoind, fulcrum and sparrow server (a wallet with minimal terminal GUI). Load averages 3.05G RAM of the total 3.7G available, no swap. b.) what's the matter with the cookie-file (fulcrum is denied access but after a while re-allowed to read the .cookie?)

Regarding a.) is there a setting to minimize fulcrums load on the RAM? Maybe a spike in load is causing the issue for fulcrum (and bitcoind for that matter)?

RequestPrivacy commented 1 year ago

After running roughly 20h just fine I have the same problem again. The connection closes but maybe its bitcoinds fault:

Mar 16 13:04:03 debian Fulcrum[1554690]: [2023-03-16 13:04:01.350] <BitcoinD.3> TCP BitcoinD.3 (id: 4) 127.0.0.1:8332: error 1 (The remote host closed the connection)
Mar 16 13:04:03 debian Fulcrum[1554690]: [2023-03-16 13:04:02.192] <BitcoinD.3> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 16 13:04:03 debian Fulcrum[1554690]: [2023-03-16 13:04:01.350] <BitcoinD.2> TCP BitcoinD.2 (id: 3) 127.0.0.1:8332: error 1 (The remote host closed the connection)
Mar 16 13:04:03 debian Fulcrum[1554690]: [2023-03-16 13:04:02.193] <SynchMempool> 762083: FAIL: bitcoind connection lost
Mar 16 13:04:03 debian Fulcrum[1554690]: [2023-03-16 13:04:01.885] <Controller> 13115 mempool txs involving 149643 addresses
Mar 16 13:04:03 debian Fulcrum[1554690]: [2023-03-16 13:04:01.680] <BitcoinD.1> TCP BitcoinD.1 (id: 2) 127.0.0.1:8332: error 1 (The remote host closed the connection)
Mar 16 13:04:03 debian Fulcrum[1554690]: [2023-03-16 13:04:02.193] <BitcoinD.2> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 16 13:04:03 debian Fulcrum[1554690]: [2023-03-16 13:04:02.193] <BitcoinD.1> Lost connection to bitcoind, will retry every 5 seconds ...
Mar 16 13:04:03 debian Fulcrum[1554690]: [2023-03-16 13:04:02.193] <Controller> Failed to synch blocks and/or mempool
Mar 16 13:04:06 debian Fulcrum[1554690]: [2023-03-16 13:04:06.945] <BitcoinD.1> Unable to open cookie file '/home/bitcoin/.bitcoin/.cookie': Permission denied
Mar 16 13:04:06 debian Fulcrum[1554690]: [2023-03-16 13:04:06.945] <BitcoinD.2> Unable to open cookie file '/home/bitcoin/.bitcoin/.cookie': Permission denied
Mar 16 13:04:06 debian Fulcrum[1554690]: [2023-03-16 13:04:06.945] <BitcoinD.3> Unable to open cookie file '/home/bitcoin/.bitcoin/.cookie': Permission denied

It seems that the .cookie file has been newly created at the same time fulcrum fails to connect:

# compare to "Mar 15" of OP
pi@debian: $ ls -la | grep .cookie
-rw-r----- 1 bitcoin btcCookie        75 Mar 16 13:04 .cookie

pi@debian: $ bitcoin-cli uptime
7032              # roughly 2h which is consistent with the creation time of the cookie file 

Will investigate why bitcoind shuts itself down. Any thoughts/help is appreciated though.

Going to close this off as it doesn't look like fulcrum is the problem here but handles the matter rather superb!

RequestPrivacy commented 1 year ago

Closing off as not related to fulcrum but rather bitcoind.

cculianu commented 1 year ago

Oh hi I just saw this, sorry for the late reply. What version of bitcoind are you running? I would be curious too if there are bugs in bitcoind. Please let me know anything you discover in this regard, and do let me know the exact version you are running so I can look at it!

Maybe also run bitcoind with -debug=all or something (note: will have lots of log spam in bitcoind's debug.log if you do that!!).

RequestPrivacy commented 1 year ago

Thanks for the reply and no worries, the issue is only 1 day old ;)

The version is Bitcoin Core 22.0.0.

pi@debian: $ sudo journalctl -fu bitcoind --since="2023-03-16 12:00:00"

Mar 16 13:04:02 debian systemd[1]: bitcoind.service: Main process exited, code=killed, status=9/KILL
Mar 16 13:04:02 debian systemd[1]: bitcoind.service: Failed with result 'signal'.
Mar 16 13:04:02 debian systemd[1]: bitcoind.service: Consumed 1h 27min 19.471s CPU time.

Something seems to kill bitcoind. I don't think its a bug really, more a limitation of my hardware / the Pi. Decreased the memory consumption with dbcache a bit, maybe its because the pi runs out of RAM?

Will keep you posted and try with more verbose log output.

RequestPrivacy commented 1 year ago

And there it is:

# /var/log/syslog
Mar 16 13:04:02 debian kernel: [17352780.421814] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=bitcoind,pid=1659642,uid=1001
Mar 16 13:04:02 debian kernel: [17352780.422033] Out of memory: Killed process 1659642 (bitcoind) total-vm:6718536kB, anon-rss:1547428kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:10384kB oom_score_adj:0
Mar 16 13:04:02 debian kernel: [17352780.723486] oom_reaper: reaped process 1659642 (bitcoind), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Sorry to have spammed you with the issue. And thanks for such a fantastic and performant indexer!

cculianu commented 1 year ago

Hmm that's strange. Yeah a smaller -dbcache can help. But also if you are using jemalloc (or I forget if bitcoind is linked with jemalloc), it can cause the process to hang on to memory for wayyy too long and not release it to the OS. Things to try:

RequestPrivacy commented 1 year ago

You are referring to this here, right?

Try doing this: Use MALLOC_CONF to instruct jemalloc to not use tcache. Put the following in the ENV to the process: MALLOC_CONF=tcache:false

Not sure how to set this though. Need to research first (or maybe you don't mind giving me some hints).

cculianu commented 1 year ago

Oh I guess Bitcoin Core doesn't use jemalloc, my bad. But yes -- in glibc's mallox there is something similar and yes you should DEFINITELY turn it to MALLOC_ARENA_MAX=1. To do that, in your bitcoind.service you should have a line before ExecStart in your [Service] section such as: Environment="MALLOC_ARENA_MAX=1".

See this link: https://www.baeldung.com/linux/systemd-services-environment-variables#setting-variables-with-environment

RequestPrivacy commented 1 year ago

Thanks for providing the instructions. After setting things up it seems that the RPi is stable again - without its memory running away.

Thanks a lot for your help!

cculianu commented 1 year ago

My pleasure!