btcsuite / btcd

An alternative full node bitcoin implementation written in Go (golang)
https://github.com/btcsuite/btcd/blob/master/README.md
ISC License
6.22k stars 2.36k forks source link

Something in the wild started freezing my btcd #1471

Closed darwin closed 5 years ago

darwin commented 5 years ago

I've been running btcd in production for a year. After initial sync problems, I got it working and it was quite smooth sail. Most recently I've been running 130ea5bddde33df32b06a1cdb42a6316eb73cff5. About a week ago I noticed the node to fall out of sync and stayed stuck.

Restarting btcd and launching it again started syncing to catch up. It was behind couple hundred blocks.

The problem is that something triggers the freeze again and again I was unable to complete the syncing. I'm able to sync few blocks and the freeze happens again. I assume this is caused by some misbehaving peer reconnecting and feeding my btcd node some problematic data.

Here is a log just before the freeze happened (with redacted IPs):

btcd    | 2019-09-24 21:52:02.788 [INF] SYNC: New valid peer <IP1>:8333 (outbound) (/Satoshi:0.15.1/)
btcd    | 2019-09-24 21:52:03.940 [INF] SYNC: Lost peer <IP1>:8333 (outbound)
btcd    | 2019-09-24 21:52:04.103 [INF] SYNC: New valid peer <IP2>:8333 (outbound) (/Satoshi:0.13.1/)
btcd    | 2019-09-24 21:52:12.065 [INF] CHAN: Adding orphan block 0000000000000000000d11ede72158396bed291a784be023a51d7f812d41196d with parent 000000000000000000100abc79d25cc29e0a22800c53e90b7b9fceaeba991e1c
btcd    | 2019-09-24 21:52:16.297 [INF] SYNC: Lost peer <IP3>:59690 (inbound)
btcd    | 2019-09-24 21:52:16.298 [INF] SYNC: Lost peer <IP4>:57225 (inbound)
btcd    | 2019-09-24 21:52:16.299 [INF] SYNC: New valid peer <IP5>:60213 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-24 21:52:18.802 [INF] SYNC: New valid peer <IP6>:34082 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-24 21:52:19.681 [INF] SYNC: Lost peer <IP6>:34082 (inbound)
btcd    | 2019-09-24 21:52:21.230 [INF] SYNC: Lost peer <IP5>:60213 (inbound)
btcd    | 2019-09-24 21:52:37.963 [INF] CHAN: Adding orphan block 00000000000000000001152b24946fffdc6d1e097e2660ceb24cfaa57afafcfb with parent 0000000000000000000d11ede72158396bed291a784be023a51d7f812d41196d
btcd    | 2019-09-24 21:52:41.396 [INF] SYNC: New valid peer <IP7>:62493 (inbound) (/Satoshi:0.12.1/)
btcd    | 2019-09-24 21:52:41.526 [INF] SYNC: New valid peer <IP7>:39297 (inbound) (/Satoshi:0.14.2/UASF-Segwit:1.0(BIP148)/)
btcd    | 2019-09-24 21:52:41.608 [INF] SYNC: Lost peer <IP7>:39297 (inbound)
btcd    | 2019-09-24 21:52:45.429 [INF] CHAN: Adding orphan block 000000000000000000120b951aaae9ab5939130050b2bc08335cef1b0f6faa5b with parent 0000000000000000000355f0074b9c2bb63affaae1a09d5e9ecff7b29bb4863a
btcd    | 2019-09-24 21:52:49.322 [INF] CHAN: Adding orphan block 000000000000000000100abc79d25cc29e0a22800c53e90b7b9fceaeba991e1c with parent 000000000000000000120b951aaae9ab5939130050b2bc08335cef1b0f6faa5b
btcd    | 2019-09-24 21:52:52.169 [INF] SYNC: New valid peer <IP8>:56180 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-24 21:52:53.227 [INF] SYNC: Lost peer <IP8>:56180 (inbound)
btcd    | 2019-09-24 21:52:55.878 [INF] SYNC: New valid peer <IP9>:38574 (inbound) (/btc_node_scanner:0.0.1/)
btcd    | 2019-09-24 21:52:56.646 [INF] SYNC: Lost peer <IP9>:38574 (inbound)
btcd    | 2019-09-24 21:53:06.707 [INF] SYNC: New valid peer <IP10>:40149 (inbound) (/nodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-24 21:53:07.073 [INF] SYNC: Lost peer <IP11>:50862 (inbound)
btcd    | 2019-09-24 21:53:09.396 [INF] SYNC: Rejected block 00000000000000000001152b24946fffdc6d1e097e2660ceb24cfaa57afafcfb from <IP12>:8333 (outbound): already have block (orphan) 00000000000000000001152b24946fffdc6d1e097e2660ceb24cfaa57afafcfb
btcd    | 2019-09-24 21:53:10.762 [INF] SYNC: Lost peer <IP13>:40149 (inbound)
btcd    | 2019-09-24 21:53:11.312 [INF] SYNC: Lost peer <IP7>:62493 (inbound)
btcd    | 2019-09-24 21:53:14.434 [INF] SYNC: Lost peer <IP14>:60589 (inbound)

A few observations: 1) btcctl from localhost doesn't respond, it stays stuck as well 1) log mentions orphan blocks 1) IP7 is interesting, it looks like it connects to my node from two separate nodes sitting behind the same IP. I also saw this IP earlier in the logs connecting with nodes identifying as /Bitcoin ABC:0.14.6(EB8.0)/ or /bitcoinj:0.14.4/Bitcoin Wallet:5.25/. 1) when I send signal to kill btcd, the logging resumes with shutdown messages (see below), so logging subsystem is not stuck, it just looks that main btcd "thread" is blocked

btcd    | 2019-09-24 21:57:08.310 [INF] BTCD: Received signal (terminated).  Shutting down...
btcd    | 2019-09-24 21:57:08.323 [INF] BTCD: Gracefully shutting down the server...
...

My workaround was to put the btcd node behind a specific bitcoind node. This let it sync and work fine (so far).

cubancorona commented 5 years ago

Itadakimasu! Thank you for helping diagnose a potentially important issue, possibly related to a recent change in the handling of a Peer's version negotiation.

Are you able to reproduce this issue consistently?

darwin commented 5 years ago

Yes, it is still reproducible. In less than 10 minutes after restarting btcd it got stuck again:

btcd    | 2019-09-26 18:04:11.201 [INF] BTCD: Version 0.12.0-beta
btcd    | 2019-09-26 18:04:11.204 [INF] BTCD: Loading block database from '/data/mainnet/blocks_ffldb'
btcd    | 2019-09-26 18:04:12.968 [INF] BTCD: Block database loaded
btcd    | 2019-09-26 18:04:13.000 [INF] INDX: Transaction index is enabled
btcd    | 2019-09-26 18:04:13.001 [INF] INDX: Committed filter index is enabled
btcd    | 2019-09-26 18:04:13.030 [INF] CHAN: Loading block index...
btcd    | 2019-09-26 18:04:40.255 [INF] CHAN: Chain state (height 596722, hash 00000000000000000011922033c6930720d0d289f83595fbd04c10a2063e4a2f, totaltx 459154406, work 2723098917047902048440593672)
btcd    | 2019-09-26 18:04:41.103 [INF] RPCS: RPC server listening on <IP>:8334
btcd    | 2019-09-26 18:04:42.828 [INF] AMGR: Loaded 27015 addresses from file '/data/mainnet/peers.json'
btcd    | 2019-09-26 18:04:42.831 [INF] CMGR: Server listening on <IP>:8333
btcd    | 2019-09-26 18:04:42.832 [INF] CMGR: Server listening on [::]:8333
btcd    | 2019-09-26 18:04:42.841 [INF] SYNC: New valid peer <IP>:64247 (inbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 18:04:42.843 [INF] SYNC: Syncing to block height 596722 from peer <IP>:64247
btcd    | 2019-09-26 18:04:42.843 [INF] SYNC: New valid peer <IP>:41778 (inbound) (/btcwire:0.5.0/neutrino:0.0.4-beta/)
btcd    | 2019-09-26 18:04:42.892 [INF] CMGR: 50 addresses found from DNS seed seed.bitcoinstats.com
btcd    | 2019-09-26 18:04:43.022 [INF] CMGR: 33 addresses found from DNS seed dnsseed.bluematt.me
btcd    | 2019-09-26 18:04:43.097 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.16.3/)
btcd    | 2019-09-26 18:04:43.099 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:<IP>/)
btcd    | 2019-09-26 18:04:43.101 [INF] CMGR: 37 addresses found from DNS seed dnsseed.bitcoin.dashjr.org
btcd    | 2019-09-26 18:04:43.104 [INF] SYNC: Lost peer <IP>:41778 (inbound)
btcd    | 2019-09-26 18:04:43.107 [INF] CMGR: 40 addresses found from DNS seed seed.bitnodes.io
btcd    | 2019-09-26 18:04:43.111 [INF] CMGR: 38 addresses found from DNS seed seed.bitcoin.jonasschnelli.ch
btcd    | 2019-09-26 18:04:43.150 [INF] CMGR: 40 addresses found from DNS seed seed.bitcoin.sipa.be
btcd    | 2019-09-26 18:04:43.152 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.17.1/)
btcd    | 2019-09-26 18:04:43.154 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 18:04:43.295 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 18:04:51.712 [INF] SYNC: New valid peer <IP>:60541 (inbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 18:04:52.194 [INF] RPCS: New websocket client <IP>:44540
btcd    | 2019-09-26 18:04:52.223 [INF] RPCS: New websocket client <IP>:44538
btcd    | 2019-09-26 18:04:52.327 [INF] RPCS: New websocket client <IP>:44542
btcd    | 2019-09-26 18:04:52.464 [INF] RPCS: New websocket client <IP>:44544
btcd    | 2019-09-26 18:04:53.530 [INF] RPCS: Beginning rescan for 352 addresses
btcd    | 2019-09-26 18:04:55.613 [INF] RPCS: Finished rescan
btcd    | 2019-09-26 18:05:13.021 [INF] SYNC: Lost peer <IP>:60541 (inbound)
btcd    | 2019-09-26 18:05:13.035 [INF] SYNC: New valid peer <IP>:45088 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-26 18:05:14.335 [INF] SYNC: Lost peer <IP>:45088 (inbound)
btcd    | 2019-09-26 18:06:16.375 [INF] SYNC: New valid peer <IP>:43156 (inbound) (/bitcoinj:0.15-SNAPSHOT/)
btcd    | 2019-09-26 18:06:16.392 [INF] SYNC: Lost peer <IP>:43156 (inbound)
btcd    | 2019-09-26 18:06:18.606 [INF] SYNC: New valid peer <IP>:37478 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-26 18:06:19.683 [INF] SYNC: Lost peer <IP>:37478 (inbound)

Please note that I have lnd running on top of it. This time btcd started from fully synced state. Also this time we have no orphan blocks mentions.

Do you want me to go before 130ea5bddde33df32b06a1cdb42a6316eb73cff5 and test it again? Or how can I provide additional debug info?

darwin commented 5 years ago

Another stuck run:

btcd    | 2019-09-26 19:00:52.034 [INF] BTCD: Version 0.12.0-beta
btcd    | 2019-09-26 19:00:52.034 [INF] BTCD: Loading block database from '/data/mainnet/blocks_ffldb'
btcd    | 2019-09-26 19:00:54.172 [INF] BTCD: Block database loaded
btcd    | 2019-09-26 19:00:54.209 [INF] INDX: Transaction index is enabled
btcd    | 2019-09-26 19:00:54.211 [INF] INDX: Committed filter index is enabled
btcd    | 2019-09-26 19:00:54.255 [INF] CHAN: Loading block index...
btcd    | 2019-09-26 19:01:17.784 [INF] CHAN: Chain state (height 596722, hash 00000000000000000011922033c6930720d0d289f83595fbd04c10a2063e4a2f, totaltx 459154406, work 2723098917047902048440593672)
btcd    | 2019-09-26 19:01:18.696 [INF] RPCS: RPC server listening on <IP>:8334
btcd    | 2019-09-26 19:01:20.794 [INF] AMGR: Loaded 27086 addresses from file '/data/mainnet/peers.json'
btcd    | 2019-09-26 19:01:20.798 [INF] CMGR: Server listening on <IP>:8333
btcd    | 2019-09-26 19:01:20.799 [INF] CMGR: Server listening on [::]:8333
btcd    | 2019-09-26 19:01:20.805 [INF] SYNC: New valid peer <IP>:42986 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:01:20.806 [INF] SYNC: New valid peer <IP>:45323 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:01:20.807 [INF] SYNC: Lost peer <IP>:45323 (inbound)
btcd    | 2019-09-26 19:01:20.808 [INF] SYNC: New valid peer <IP>:40140 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-26 19:01:20.814 [INF] SYNC: New valid peer <IP>:64329 (inbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 19:01:20.854 [INF] SYNC: Syncing to block height 596727 from peer <IP>:64329
btcd    | 2019-09-26 19:01:20.852 [INF] CMGR: 37 addresses found from DNS seed dnsseed.bitcoin.dashjr.org
btcd    | 2019-09-26 19:01:20.854 [INF] CMGR: 38 addresses found from DNS seed seed.bitcoin.jonasschnelli.ch
btcd    | 2019-09-26 19:01:20.891 [INF] CMGR: 50 addresses found from DNS seed seed.bitcoinstats.com
btcd    | 2019-09-26 19:01:20.948 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.17.1(matbea.net; Good Bitcoin Node)/)
btcd    | 2019-09-26 19:01:20.949 [INF] PEER: Disconnecting non-segwit peer <IP>:8333 (outbound), isn't segwit enabled and we need more segwit enabled peers
btcd    | 2019-09-26 19:01:20.951 [INF] CMGR: 33 addresses found from DNS seed dnsseed.bluematt.me
btcd    | 2019-09-26 19:01:21.044 [INF] CMGR: 40 addresses found from DNS seed seed.bitcoin.sipa.be
btcd    | 2019-09-26 19:01:21.117 [WRN] SYNC: Received done peer message for unknown peer <IP>:8333 (outbound)
btcd    | 2019-09-26 19:01:21.119 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:01:21.219 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:01:21.350 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.13.2/)
btcd    | 2019-09-26 19:01:21.820 [INF] SYNC: Lost peer <IP>:40140 (inbound)
btcd    | 2019-09-26 19:01:23.209 [INF] SYNC: Lost peer <IP>:42986 (inbound)
btcd    | 2019-09-26 19:01:23.965 [INF] CMGR: 40 addresses found from DNS seed seed.bitnodes.io
btcd    | 2019-09-26 19:01:30.239 [INF] RPCS: New websocket client <IP>:44742
btcd    | 2019-09-26 19:01:30.482 [INF] RPCS: New websocket client <IP>:44738
btcd    | 2019-09-26 19:01:30.726 [INF] RPCS: New websocket client <IP>:44740
btcd    | 2019-09-26 19:01:30.760 [INF] RPCS: New websocket client <IP>:44736
btcd    | 2019-09-26 19:01:31.815 [INF] RPCS: Beginning rescan for 352 addresses
btcd    | 2019-09-26 19:01:33.394 [INF] RPCS: Finished rescan
btcd    | 2019-09-26 19:01:51.797 [INF] SYNC: Processed 1 block in the last 33.27s (3092 transactions, height 596723, 2019-09-26 18:07:51 +0000 UTC)
btcd    | 2019-09-26 19:01:51.893 [INF] SYNC: New valid peer <IP>:35017 (inbound) (/nodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-26 19:01:51.902 [INF] SYNC: Lost peer <IP>:35017 (inbound)
btcd    | 2019-09-26 19:01:52.362 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:01:52.363 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:01:52.364 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.17.1/)
btcd    | 2019-09-26 19:01:52.365 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.17.1/)
btcd    | 2019-09-26 19:01:52.366 [INF] SYNC: Lost peer <IP>:8333 (outbound)
btcd    | 2019-09-26 19:01:52.583 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.16.0/)
btcd    | 2019-09-26 19:02:26.574 [INF] SYNC: Processed 1 block in the last 34.77s (1619 transactions, height 596724, 2019-09-26 18:09:49 +0000 UTC)
btcd    | 2019-09-26 19:03:14.149 [INF] SYNC: Processed 1 block in the last 47.57s (2505 transactions, height 596725, 2019-09-26 18:19:56 +0000 UTC)
btcd    | 2019-09-26 19:03:14.761 [INF] SYNC: New valid peer <IP>:49395 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:03:14.941 [INF] SYNC: New valid peer <IP>:17473 (inbound) (/nodes.multiven.io:0.1/)
btcd    | 2019-09-26 19:03:54.392 [INF] SYNC: Processed 1 block in the last 40.24s (1825 transactions, height 596726, 2019-09-26 18:22:14 +0000 UTC)
btcd    | 2019-09-26 19:03:54.566 [INF] SYNC: Lost peer <IP>:17473 (inbound)
btcd    | 2019-09-26 19:03:54.567 [INF] SYNC: New valid peer <IP>:47862 (inbound) (/bitcoinj:0.15.1/)
btcd    | 2019-09-26 19:03:54.568 [INF] SYNC: Lost peer <IP>:47862 (inbound)
btcd    | 2019-09-26 19:04:34.116 [INF] SYNC: Processed 1 block in the last 39.72s (2602 transactions, height 596727, 2019-09-26 18:34:02 +0000 UTC)
btcd    | 2019-09-26 19:04:34.362 [INF] SYNC: New valid peer <IP>:50700 (inbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 19:04:34.363 [INF] SYNC: Lost peer <IP>:50700 (inbound)
btcd    | 2019-09-26 19:04:34.363 [INF] SYNC: New valid peer <IP>:38212 (inbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:04:34.999 [INF] SYNC: New valid peer <IP>:64334 (inbound) (/nodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-26 19:04:38.541 [INF] SYNC: Lost peer <IP>:64334 (inbound)
btcd    | 2019-09-26 19:04:46.142 [INF] SYNC: New valid peer <IP>:5961 (inbound) (/nodes.multiven.io:0.1/)
btcd    | 2019-09-26 19:04:51.195 [INF] SYNC: New valid peer <IP>:37838 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-26 19:04:52.389 [INF] SYNC: Lost peer <IP>:37838 (inbound)
btcd    | 2019-09-26 19:04:58.215 [INF] SYNC: New valid peer <IP>:5619 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:04:59.838 [INF] SYNC: Lost peer <IP>:5619 (inbound)
btcd    | 2019-09-26 19:05:04.069 [INF] SYNC: New valid peer <IP>:37677 (inbound) (/Satoshi:0.17.0/)
btcd    | 2019-09-26 19:05:40.668 [INF] SYNC: New valid peer <IP>:59412 (inbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 19:05:45.677 [INF] SYNC: New valid peer <IP>:47839 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:05:47.231 [INF] SYNC: New valid peer <IP>:27713 (inbound) (/nodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-26 19:05:47.947 [INF] SYNC: Lost peer <IP>:47839 (inbound)
btcd    | 2019-09-26 19:05:49.723 [INF] SYNC: Lost peer <IP>:27713 (inbound)
btcd    | 2019-09-26 19:05:57.646 [INF] SYNC: Lost peer <IP>:37677 (inbound)
btcd    | 2019-09-26 19:06:17.572 [INF] SYNC: New valid peer <IP>:54476 (inbound) (/btc_node_scanner:0.0.1/)
btcd    | 2019-09-26 19:06:18.122 [INF] SYNC: Lost peer <IP>:54476 (inbound)
btcd    | 2019-09-26 19:06:25.273 [INF] SYNC: Lost peer <IP>:59412 (inbound)
btcd    | 2019-09-26 19:06:34.446 [INF] SYNC: New valid peer <IP>:54724 (inbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 19:06:53.464 [INF] SYNC: New valid peer <IP>:51835 (inbound) (/open-nodes:0.1/)
btcd    | 2019-09-26 19:06:53.663 [INF] SYNC: New valid peer <IP>:7733 (inbound) (/nodes.multiven.io:0.1/)
btcd    | 2019-09-26 19:06:54.496 [INF] SYNC: Lost peer <IP>:51835 (inbound)
btcd    | 2019-09-26 19:06:55.186 [INF] SYNC: Lost peer <IP>:7733 (inbound)
btcd    | 2019-09-26 19:07:04.331 [INF] SYNC: New valid peer <IP>:29437 (inbound) (/Satoshi:0.15.1/)
btcd    | 2019-09-26 19:07:13.758 [INF] SYNC: Lost peer <IP>:54724 (inbound)
btcd    | 2019-09-26 19:07:18.466 [INF] SYNC: Lost peer <IP>:38212 (inbound)
btcd    | 2019-09-26 19:07:30.367 [INF] SYNC: New valid peer <IP>:57497 (inbound) (/bitnodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-26 19:07:31.432 [INF] SYNC: New valid peer <IP>:51118 (inbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:07:50.856 [INF] SYNC: Syncing to block height 596727 from peer <IP>:64329
btcd    | 2019-09-26 19:08:07.310 [INF] SYNC: New valid peer <IP>:38209 (inbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:08:13.055 [INF] SYNC: New valid peer <IP>:31663 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:08:14.625 [INF] SYNC: Lost peer <IP>:31663 (inbound)
btcd    | 2019-09-26 19:08:15.964 [INF] SYNC: Lost peer <IP>:51118 (inbound)
btcd    | 2019-09-26 19:08:25.776 [INF] SYNC: New valid peer <IP>:32123 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:08:27.197 [INF] SYNC: Lost peer <IP>:32123 (inbound)
btcd    | 2019-09-26 19:08:37.642 [INF] SYNC: New valid peer <IP>:53475 (inbound) (/nodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-26 19:08:39.176 [INF] SYNC: New valid peer <IP>:31367 (inbound) (/nodes.multiven.io:0.1/)
btcd    | 2019-09-26 19:08:41.344 [INF] SYNC: Lost peer <IP>:31367 (inbound)
btcd    | 2019-09-26 19:08:41.775 [INF] SYNC: Lost peer <IP>:38209 (inbound)
btcd    | 2019-09-26 19:08:45.468 [INF] SYNC: Lost peer <IP>:53475 (inbound)
btcd    | 2019-09-26 19:08:51.081 [INF] SYNC: New valid peer <IP>:40396 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:08:54.813 [INF] SYNC: Lost peer <IP>:40396 (inbound)
btcd    | 2019-09-26 19:08:57.374 [INF] SYNC: New valid peer <IP>:14425 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:08:59.423 [INF] SYNC: New valid peer <IP>:40030 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:09:00.869 [INF] SYNC: Lost peer <IP>:14425 (inbound)
btcd    | 2019-09-26 19:09:01.826 [INF] SYNC: Lost peer <IP>:40030 (inbound)
btcd    | 2019-09-26 19:10:06.032 [INF] SYNC: Processed 1 block in the last 5m31.91s (2985 transactions, height 596728, 2019-09-26 19:08:14 +0000 UTC)
btcd    | 2019-09-26 19:10:11.260 [INF] SYNC: New valid peer <IP>:43336 (inbound) (/btcwire:0.5.0/neutrino:0.0.4-beta/)
btcd    | 2019-09-26 19:10:11.274 [INF] SYNC: New valid peer <IP>:36902 (inbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:10:11.275 [INF] SYNC: Lost peer <IP>:43336 (inbound)
btcd    | 2019-09-26 19:10:11.275 [INF] SYNC: New valid peer <IP>:38377 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:10:12.055 [INF] SYNC: Lost peer <IP>:38377 (inbound)
btcd    | 2019-09-26 19:11:07.915 [INF] SYNC: Processed 1 block in the last 1m1.88s (2807 transactions, height 596729, 2019-09-26 19:10:44 +0000 UTC)
btcd    | 2019-09-26 19:11:07.916 [INF] SYNC: Syncing to block height 596729 from peer <IP>:8333
btcd    | 2019-09-26 19:11:09.028 [INF] SYNC: New valid peer <IP>:22794 (inbound) (/nodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-26 19:11:09.368 [INF] SYNC: Lost peer <IP>:22794 (inbound)
btcd    | 2019-09-26 19:11:21.920 [INF] SYNC: Lost peer <IP>:36902 (inbound)
btcd    | 2019-09-26 19:11:24.488 [INF] SYNC: New valid peer <IP>:47554 (inbound) (/bitcoinj:0.15.1/)
btcd    | 2019-09-26 19:11:24.544 [INF] SYNC: Lost peer <IP>:47554 (inbound)
btcd    | 2019-09-26 19:11:46.954 [INF] SYNC: New valid peer <IP>:39374 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-26 19:11:48.130 [INF] SYNC: Lost peer <IP>:39374 (inbound)
btcd    | 2019-09-26 19:11:50.714 [INF] SYNC: New valid peer <IP>:35377 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:11:52.988 [INF] SYNC: Lost peer <IP>:35377 (inbound)
btcd    | 2019-09-26 19:12:11.061 [INF] SYNC: New valid peer <IP>:44368 (inbound) (/btc_node_scanner:0.0.1/)
btcd    | 2019-09-26 19:12:11.117 [INF] SYNC: Lost peer <IP>:44368 (inbound)
btcd    | 2019-09-26 19:12:22.390 [INF] SYNC: New valid peer <IP>:3251 (inbound) (/nodes.multiven.io:0.1/)
btcd    | 2019-09-26 19:12:23.698 [INF] SYNC: Lost peer <IP>:3251 (inbound)
btcd    | 2019-09-26 19:12:27.935 [INF] SYNC: New valid peer <IP>:43384 (inbound) (/btcwire:0.5.0/neutrino:0.0.4-beta/)
btcd    | 2019-09-26 19:12:28.160 [INF] SYNC: Lost peer <IP>:43384 (inbound)
btcd    | 2019-09-26 19:12:38.444 [INF] SYNC: New valid peer <IP>:45182 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:12:43.119 [INF] SYNC: Lost peer <IP>:45182 (inbound)
btcd    | 2019-09-26 19:13:00.167 [INF] SYNC: New valid peer <IP>:41201 (inbound) (/nodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-26 19:13:03.320 [INF] SYNC: Lost peer <IP>:41201 (inbound)
btcd    | 2019-09-26 19:13:23.692 [INF] SYNC: New valid peer <IP>:11531 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:13:25.766 [INF] SYNC: Lost peer <IP>:11531 (inbound)
btcd    | 2019-09-26 19:13:34.504 [INF] SYNC: New valid peer <IP>:19323 (inbound) (/nodes.multiven.io:0.1/)
btcd    | 2019-09-26 19:13:37.989 [INF] SYNC: Lost peer <IP>:19323 (inbound)
btcd    | 2019-09-26 19:14:09.559 [INF] SYNC: New valid peer <IP>:49638 (inbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:14:20.436 [INF] SYNC: New valid peer <IP>:51810 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-26 19:14:20.796 [INF] SYNC: Syncing to block height 596729 from peer <IP>:8333
btcd    | 2019-09-26 19:14:21.487 [INF] SYNC: Lost peer <IP>:51810 (inbound)
btcd    | 2019-09-26 19:14:23.165 [INF] SYNC: Lost peer <IP>:49638 (inbound)
btcd    | 2019-09-26 19:14:59.370 [INF] SYNC: New valid peer <IP>:52672 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-26 19:15:00.530 [INF] SYNC: Lost peer <IP>:52672 (inbound)
btcd    | 2019-09-26 19:15:37.535 [INF] SYNC: New valid peer <IP>:48522 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-26 19:15:38.942 [INF] SYNC: Lost peer <IP>:48522 (inbound)
btcd    | 2019-09-26 19:16:00.860 [INF] SYNC: New valid peer <IP>:49703 (inbound) (/nodes.multiven.io:0.1/)
btcd    | 2019-09-26 19:16:02.248 [INF] SYNC: Lost peer <IP>:49703 (inbound)
btcd    | 2019-09-26 19:16:15.274 [INF] SYNC: New valid peer <IP>:5505 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:16:17.208 [INF] SYNC: Lost peer <IP>:5505 (inbound)
btcd    | 2019-09-26 19:16:53.818 [INF] SYNC: New valid peer <IP>:43452 (inbound) (/btcwire:0.5.0/neutrino:0.0.4-beta/)
btcd    | 2019-09-26 19:16:53.979 [INF] SYNC: Lost peer <IP>:43452 (inbound)
btcd    | 2019-09-26 19:17:02.988 [INF] SYNC: New valid peer <IP>:46720 (inbound) (/bitcoin-seeder:0.01/)
btcd    | 2019-09-26 19:17:04.028 [INF] SYNC: Lost peer <IP>:46720 (inbound)
btcd    | 2019-09-26 19:17:04.481 [INF] SYNC: New valid peer <IP>:43488 (inbound) (/btcwire:0.5.0/neutrino:0.0.4-beta/)
btcd    | 2019-09-26 19:17:04.792 [INF] SYNC: Lost peer <IP>:43488 (inbound)
btcd    | 2019-09-26 19:17:10.010 [INF] SYNC: New valid peer <IP>:15941 (inbound) (/nodes.bitcoin-russia.ru:0.0.1f/)
btcd    | 2019-09-26 19:17:11.093 [INF] SYNC: New valid peer <IP>:49074 (inbound) (/breadwallet:1.50/)
btcd    | 2019-09-26 19:17:11.116 [INF] SYNC: New valid peer <IP>:6025 (inbound) (/Satoshi:0.8.6/)
btcd    | 2019-09-26 19:17:11.220 [INF] SYNC: Lost peer <IP>:6025 (inbound)
btcd    | 2019-09-26 19:17:12.256 [INF] SYNC: Lost peer <IP>:15941 (inbound)
btcd    | 2019-09-26 19:17:13.917 [INF] SYNC: New valid peer <IP>:4881 (inbound) (/btcwire:0.5.0/neutrino:0.0.4-beta/)
btcd    | 2019-09-26 19:17:20.804 [INF] SYNC: Syncing to block height 596729 from peer <IP>:8333
btcd    | 2019-09-26 19:17:36.445 [INF] SYNC: Lost peer <IP>:49074 (inbound)
btcd    | 2019-09-26 19:17:38.470 [INF] SYNC: New valid peer <IP>:18830 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:17:42.199 [INF] SYNC: Lost peer <IP>:18830 (inbound)
btcd    | 2019-09-26 19:17:53.860 [INF] SYNC: New valid peer <IP>:34926 (inbound) (/btc_node_scanner:0.0.1/)
btcd    | 2019-09-26 19:17:54.358 [INF] SYNC: Lost peer <IP>:34926 (inbound)
btcd    | 2019-09-26 19:18:34.131 [INF] SYNC: New valid peer <IP>:54350 (inbound) (/bread:2.1/)
btcd    | 2019-09-26 19:18:38.816 [INF] SYNC: New valid peer <IP>:54697 (inbound) (/btcwire:0.5.0/neutrino:0.0.4-beta/)
btcd    | 2019-09-26 19:18:39.006 [INF] SYNC: Lost peer <IP>:54697 (inbound)
btcd    | 2019-09-26 19:18:39.531 [INF] SYNC: New valid peer <IP>:46184 (inbound) (/bitcoinj:0.15.1/)
btcd    | 2019-09-26 19:18:39.595 [INF] SYNC: Lost peer <IP>:46184 (inbound)
btcd    | 2019-09-26 19:18:40.593 [INF] SYNC: New valid peer <IP>:54701 (inbound) (/btcwire:0.5.0/neutrino:0.0.4-beta/)
btcd    | 2019-09-26 19:18:40.919 [INF] SYNC: Lost peer <IP>:54701 (inbound)
wpaulino commented 5 years ago

@darwin can you run with a debug log level?

darwin commented 5 years ago

@wpaulino sure, can you point me to some docs how to configure it? I'm not very familiar with go stack.

Here is another stuck run, running checkout of 009f1993178a97d7466ff500fae086e4425676f1

btcd    | 2019-09-26 19:27:34.790 [INF] BTCD: Version 0.12.0-beta
btcd    | 2019-09-26 19:27:34.801 [INF] BTCD: Loading block database from '/data/mainnet/blocks_ffldb'
btcd    | 2019-09-26 19:27:36.801 [INF] BTCD: Block database loaded
btcd    | 2019-09-26 19:27:36.836 [INF] INDX: Transaction index is enabled
btcd    | 2019-09-26 19:27:36.837 [INF] INDX: Committed filter index is enabled
btcd    | 2019-09-26 19:27:36.841 [INF] CHAN: Loading block index...
btcd    | 2019-09-26 19:27:58.956 [INF] CHAN: Chain state (height 596729, hash 0000000000000000001602dc564003045aa54ac4a040347e9a5fa36026d40706, totaltx 459171841, work 2723456410518165574161211536)
btcd    | 2019-09-26 19:27:59.615 [INF] RPCS: RPC server listening on <IP>:8334
btcd    | 2019-09-26 19:28:00.660 [INF] AMGR: Loaded 27142 addresses from file '/data/mainnet/peers.json'
btcd    | 2019-09-26 19:28:00.674 [INF] CMGR: Server listening on <IP>:8333
btcd    | 2019-09-26 19:28:00.675 [INF] CMGR: Server listening on [::]:8333
btcd    | 2019-09-26 19:28:00.677 [INF] SYNC: New valid peer <IP>:50652 (inbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:28:00.678 [INF] SYNC: Syncing to block height 596730 from peer <IP>:50652
btcd    | 2019-09-26 19:28:00.712 [INF] CMGR: 50 addresses found from DNS seed seed.bitcoinstats.com
btcd    | 2019-09-26 19:28:00.717 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 19:28:00.725 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 19:28:00.726 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.16.3(bitcore)/)
btcd    | 2019-09-26 19:28:00.749 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:28:00.846 [INF] CMGR: 33 addresses found from DNS seed dnsseed.bluematt.me
btcd    | 2019-09-26 19:28:00.855 [INF] CMGR: 38 addresses found from DNS seed seed.bitcoin.jonasschnelli.ch
btcd    | 2019-09-26 19:28:00.871 [INF] CMGR: 37 addresses found from DNS seed dnsseed.bitcoin.dashjr.org
btcd    | 2019-09-26 19:28:00.897 [INF] CMGR: 40 addresses found from DNS seed seed.bitcoin.sipa.be
btcd    | 2019-09-26 19:28:00.911 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.17.1/)
btcd    | 2019-09-26 19:28:05.709 [INF] CMGR: 40 addresses found from DNS seed seed.bitnodes.io
btcd    | 2019-09-26 19:28:33.271 [INF] RPCS: New websocket client <IP>:44964
btcd    | 2019-09-26 19:28:33.296 [INF] RPCS: New websocket client <IP>:44966
btcd    | 2019-09-26 19:28:33.322 [INF] RPCS: New websocket client <IP>:44970
btcd    | 2019-09-26 19:28:33.543 [INF] RPCS: New websocket client <IP>:44968
btcd    | 2019-09-26 19:28:35.393 [INF] RPCS: Beginning rescan for 352 addresses
btcd    | 2019-09-26 19:28:37.073 [INF] SYNC: Processed 1 block in the last 37.57s (3311 transactions, height 596730, 2019-09-26 19:20:51 +0000 UTC)
btcd    | 2019-09-26 19:28:37.076 [INF] SYNC: Lost peer <IP>:8333 (outbound)
btcd    | 2019-09-26 19:28:37.254 [INF] SYNC: New valid peer <IP>:64364 (inbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 19:28:37.755 [INF] SYNC: New valid peer <IP>:44715 (inbound) (/nodes.multiven.io:0.1/)
btcd    | 2019-09-26 19:28:37.757 [INF] SYNC: Lost peer <IP>:44715 (inbound)
btcd    | 2019-09-26 19:28:37.759 [INF] SYNC: New valid peer <IP>:56126 (inbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:28:37.920 [INF] SYNC: New valid peer <IP>:33485 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:28:38.005 [INF] SYNC: Lost peer <IP>:33485 (inbound)
btcd    | 2019-09-26 19:28:38.474 [WRN] SYNC: Received done peer message for unknown peer <IP>:8333 (outbound)
btcd    | 2019-09-26 19:28:38.476 [INF] SYNC: New valid peer <IP>:1187 (inbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:28:39.885 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.17.1/)
btcd    | 2019-09-26 19:28:39.893 [INF] SYNC: New valid peer <IP>:8333 (outbound) (/Satoshi:0.18.1/)
btcd    | 2019-09-26 19:28:40.898 [INF] RPCS: Finished rescan
btcd    | 2019-09-26 19:29:01.050 [WRN] CHAN: Unknown block versions are being mined, so new rules might be in effect.  Are you running the latest version of the software?
btcd    | 2019-09-26 19:29:02.045 [INF] SYNC: Processed 1 block in the last 24.97s (2651 transactions, height 596731, 2019-09-26 19:27:47 +0000 UTC)
btcd    | 2019-09-26 19:29:02.222 [INF] SYNC: Lost peer <IP>:1187 (inbound)
btcd    | 2019-09-26 19:29:15.997 [INF] SYNC: New valid peer <IP>:20227 (inbound) (/nodes.multiven.io:0.1/)
btcd    | 2019-09-26 19:29:17.255 [INF] SYNC: New valid peer <IP>:34779 (inbound) (/bitnodes.earn.com:0.1/)
btcd    | 2019-09-26 19:29:18.384 [INF] SYNC: New valid peer <IP>:54502 (inbound) (/Satoshi:0.18.0/)
btcd    | 2019-09-26 19:29:19.428 [INF] SYNC: Lost peer <IP>:34779 (inbound)
btcd    | 2019-09-26 19:29:20.310 [INF] SYNC: Lost peer <IP>:50652 (inbound)
btcd    | 2019-09-26 19:29:20.311 [INF] SYNC: Syncing to block height 596731 from peer <IP>:8333
btcd    | 2019-09-26 19:29:21.449 [INF] SYNC: Lost peer <IP>:8333 (outbound)
btcd    | 2019-09-26 19:29:27.404 [INF] SYNC: New valid peer <IP>:44374 (inbound) (/btc_node_scanner:0.0.1/)
btcd    | 2019-09-26 19:29:27.946 [INF] SYNC: Lost peer <IP>:44374 (inbound)
wpaulino commented 5 years ago

btcd --debuglevel=debug, see btcd -h.

Roasbeef commented 5 years ago

The problem is that something triggers the freeze again and again I was unable to complete the syncing. I'm able to sync few blocks and the freeze happens again

Looking at your logs, its mostly either seeders or light clients connecting/disconnecting. It is the case however, that if you end up connecting to a Bitcoin ABC or SV node, that sync can halt. Our temporary solution for that is a branch that allows you to black list certain user agents.

darwin commented 5 years ago

Another freeze with debug log enabled, running 130ea5bddde33df32b06a1cdb42a6316eb73cff5:

https://gist.github.com/darwin/c3ff828e37910e7c9157c269b5d31240

After the freeze the log is being spammed with Received inv, ping/pong and what appears to me as occasional peer handshakes. But the state of the node is "effectively frozen" as described above.

cubancorona commented 5 years ago

Thank you for all the helpful information, @darwin!

Would you like to help me test a potential temporary resolution?

darwin commented 5 years ago

Sure @cubancorona , want to do it privately? My email is my github commits, or we can chat via keybase.io/darwin. Ideally I would compile your fork and test it.

cubancorona commented 5 years ago

@Roasbeef and @wpaulino, how do you think we should proceed? This may be a good opportunity to identify a potential underlying issue.

cubancorona commented 5 years ago

Hi, @darwin, how are you presently? I would like to thank you very much for your patience and indispensable assistance. If you are comfortable testing a minor modification to origin/master, please try running this branch, which contains a simple modification: https://github.com/cubancorona/btcd/tree/masterTesting

darwin commented 5 years ago

@cubancorona your fork got stuck as well: https://gist.github.com/darwin/3d69d1565981bfd6e3d89bf594c08e10

darwin commented 5 years ago

Going for a vacation. I will be out of office till Sunday. So I will be less responsive until then.

cubancorona commented 5 years ago

Thank you. If you would like to help me test another branch with more comprehensive changes, you can try the following branch. But, it may not precisely identify the issue(s) without working forward from origin/master.

https://github.com/cubancorona/btcd/tree/master

darwin commented 5 years ago

@cubancorona It fails to build for me: https://gist.github.com/darwin/5e370c676ccacf881a022e454644e799

I'm building it in alpine-based docker container. Do I need latest go? Or what am I missing?

cubancorona commented 5 years ago

Thank you again and have a nice vacation. You might try the following, which adds a dependency that I potentially hope to internalize:

go get github.com/jonboulle/clockwork

cubancorona commented 5 years ago

I am using that package for testing another potential issue related to handshake synchronization I identified some time ago.

darwin commented 5 years ago

go get github.com/jonboulle/clockwork

This didn't help (I'm running it before go install). I'm getting the same error.

cubancorona commented 5 years ago

@darwin, I hope you are having a nice time away. Please forgive me for my hasty response earlier today. I have pushed an update to the same branch, which hopefully should resolve a likely module versioning issue:

https://github.com/cubancorona/btcd/tree/master

frameloss commented 5 years ago

Hi @cubancorona I've been seeing the same issues, just updated to your latest build and still seeing the lockup. It seems to happen each time just a few minutes after it fully syncs, thankfully no db corruption. Not sure if it matters, but this node is running with --txindex --addrindex.

Let me know if there's anything you need me to try, unfortunately I am traveling starting tomorrow, but can help out today.

cubancorona commented 5 years ago

Itadakimasu! I hope you have safe travels, @frameloss. When you are safely available, would you mind confirming which commit or module version you are running?

cubancorona commented 5 years ago

I am not presently experiencing any stalling or freezing while synchronizing the Blockchain via mainnet. I will update everyone accordingly once the time is appropriate.

cubancorona commented 5 years ago

Good morning, @darwin, how are you feeling today? I hope you had an enjoyable time away. Will you be willing to provide an update?

darwin commented 5 years ago

I'm well and back @cubancorona. Have been running your master for about 16 hours and it didn't get stuck so far. Fingers crossed.

cubancorona commented 5 years ago

Itadakimasu! Thanks for the update @darwin.

darwin commented 5 years ago

The node is still up and running (37 hours). Currently it has 29 connections.

Note that this is not a proof that the issue was fixed. Usually the connection count is much higher. It might be that the problematic peer is not connecting to it anymore - e.g. due to the fact that the node wasn't accepting connections during my vacation.

If you wanted I could try to switch to problematic https://github.com/btcsuite/btcd/commit/130ea5bddde33df32b06a1cdb42a6316eb73cff5 and test it again to see if it gets stuck again.

cubancorona commented 5 years ago

Itadakimasu 🤠 Due to one of the potential issues I previously identified and addressed in my repository, it is possible that, in the implementation of the btcd master repository, some nodes are not disconnecting properly. This could be a reason for an artificially high connection count, and potentially also a cause of stalling.

darwin commented 5 years ago

The node is still up and running (~2 days). Currently it has 37 connections.

cubancorona commented 5 years ago

@darwin, how many are incoming vs. outgoing, if you don't mind me asking?

darwin commented 5 years ago

@cubancorona I didn't change the default settings for outbound connections. Just checked it via btcctl getpeerinfo - it reported 8 outbound connections, the rest is inbound. Currently 33 in total.

wpaulino commented 5 years ago

@darwin, could you try cherry picking https://github.com/btcsuite/btcd/pull/1477 on top of the current master branch to see if it helps improve your situation?

darwin commented 5 years ago

@wpaulino it got stuck again in less than 32 minutes (current master 988181ef23fae1e039e0eeb459f03edfe7fed983 with your #1477 merged on top)

wpaulino commented 5 years ago

Can you restart with the --profile=PORT flag and obtain a goroutine dump with curl http//localhost:PORT/debug/pprof/goroutine once it gets stuck and paste the output?

darwin commented 5 years ago

It seems to be giving me some binary output of size 4532 bytes.

will try tomorrow, it is 3am here

wpaulino commented 5 years ago

Sorry, should be curl "http//localhost:PORT/debug/pprof/goroutine?debug=1".

darwin commented 5 years ago

Here you go.

It might be in strange state already because I cancelled force-quitting of the docker container when I noticed your update. Will try again tomorrow right after the node gets stuck again.

darwin commented 5 years ago

At least this exercise confirmed that @cubancorona's changes fixed/didn't trigger the issue on the machine. Because with his code the node runs again.

cubancorona commented 5 years ago

This confirms my suspicion seemingly! 🤠 Itadakimasu! Thank you, @darwin and @wpaulino.

wpaulino commented 5 years ago

Thanks @darwin, could you try out this branch and report back?

darwin commented 5 years ago

@wpaulino So far it looks good. The node has been running for 2 hours.

wpaulino commented 5 years ago

Nice, I think that should do it. I opened it as https://github.com/btcsuite/btcd/pull/1480. Feel free to report here or there if you run into any issues.

cubancorona commented 5 years ago

@wpaulino, potentially this is not resolving the issue (and leaving others unaddressed), but I support a conservative modification pathway.

darwin commented 5 years ago

still running @wpaulino's fork, 31hrs and still good, 45 connections in total