ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.84k stars 901 forks source link

Node dies/restarts randomly - probably due to gossipd failing. #5299

Closed zerofeerouting closed 2 years ago

zerofeerouting commented 2 years ago

Issue and Steps to Reproduce

The node stops - apparently due to gossipd failing and automatically restarts

2022-06-01T07:16:33.152Z INFO    028111c2552a2fdd19cde97a18f68f8ece30a8bee45f6070109e6d34115ff9e7ae-chan#4199: Peer transient failure in CHANNELD_NORMAL: Reconnected
2022-06-01T07:16:36.765Z **BROKEN** gossipd: Unknown peer 028873425462e0addf3ad8a2f3ae72ef23b4ea4165d9739134b3ff1a86a0a8eb00 for local_channel_announcement
2022-06-01T07:16:37.189Z **BROKEN** gossipd: Unknown peer 028873425462e0addf3ad8a2f3ae72ef23b4ea4165d9739134b3ff1a86a0a8eb00 for local_channel_announcement
2022-06-01T07:16:37.898Z **BROKEN** gossipd: Unknown peer 036508f7e82bb78bad307cfacf4edf850fc3f20ca071eaa8074d9d5424a9092c0b for local_channel_announcement
2022-06-01T07:16:38.450Z INFO    036aa6d860b1229aa56c4b9422e06bbbfdc160e5d62511187be9a91aedeb9ca309-channeld-chan#7470: Peer connection lost
2022-06-01T07:16:38.450Z INFO    036aa6d860b1229aa56c4b9422e06bbbfdc160e5d62511187be9a91aedeb9ca309-chan#7470: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208)
2022-06-01T07:16:38.452Z **BROKEN** gossipd: Unknown peer 0225cb6d2874d7cf9ed711479059cf8ff472630188376949879e6253cac00701b0 for local_channel_announcement
2022-06-01T07:16:39.537Z **BROKEN** gossipd: Unknown peer 0343ac307bf48a6500a8529f10e80136c428ac628ec4f7619565dfd447805ccb8c for local_channel_announcement
lightningd: gossipd failed (exit status 2), exiting.
Lost connection to the RPC socket.
Lost connection to the RPC socket.
Lost connection to the RPC socket.
Lost connection to the RPC socket.
Lost connection to the RPC socket.
Lost connection to the RPC socket.
2022-06-01T07:16:41.028Z INFO    connectd: Static Tor service onion address: "xtdo5qvvfwcjaruj6z4acdcw4azagn6tdgac4ajnekjdn4ghr6qw2nqd.onion:9735,0.0.0.0:9735" bound from extern port 9735 
2022-06-01T07:16:41.114Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
2022-06-01T07:16:44.094Z INFO    lightningd: Restarting onchaind for channel 193

getinfo output

{
   "id": "038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5",
   "alias": "zero fee routing | CLN",
   "color": "1c262f",
   "num_peers": 999,
   "num_pending_channels": 2,
   "num_active_channels": 992,
   "num_inactive_channels": 37,
   "address": [
      {
         "type": "ipv4",
         "address": "167.235.3.234",
         "port": 9735
      },
      {
         "type": "torv3",
         "address": "xtdo5qvvfwcjaruj6z4acdcw4azagn6tdgac4ajnekjdn4ghr6qw2nqd.onion",
         "port": 9735
      }
   ],
   "binding": [
      {
         "type": "ipv4",
         "address": "0.0.0.0",
         "port": 9735
      }
   ],
   "version": "v0.11.1",
   "blockheight": 738838,
   "network": "bitcoin",
   // ...
}
zerofeerouting commented 2 years ago

Additional info

a couple of minutes before I had this:

lightning_connectd: ccan/ccan/tal/tal.c:393: del_tree: Assertion `!taken(from_tal_hdr(t))' failed.
lightning_connectd: FATAL SIGNAL 6 (version v0.11.1)
0x5591bdafdbe5 send_backtrace
    common/daemon.c:33
0x5591bdafdc6f crashdump
    common/daemon.c:46
0x7f822b62672f ???
    ???:0
0x7f822b4207bb ???
    ???:0
0x7f822b40b534 ???
    ???:0
0x7f822b40b40e ???
    ???:0
0x7f822b419101 ???
    ???:0
0x5591bdb3ba23 del_tree
    ccan/ccan/tal/tal.c:393
0x5591bdb3ba3f del_tree
    ccan/ccan/tal/tal.c:412
0x5591bdb3bf1e tal_free
    ccan/ccan/tal/tal.c:486
0x5591bdaf389c peer_reconnected
    connectd/connectd.c:259
0x5591bdaf3b96 peer_connected
    connectd/connectd.c:351
0x5591bdaf3f01 retry_peer_connected
    connectd/connectd.c:228
0x5591bdb30bd1 next_plan
    ccan/ccan/io/io.c:59
0x5591bdb30ff2 io_do_always
    ccan/ccan/io/io.c:435
0x5591bdb32494 handle_always
    ccan/ccan/io/poll.c:304
0x5591bdb327d0 io_loop
    ccan/ccan/io/poll.c:385
0x5591bdaf423e main
    connectd/connectd.c:2158
0x7f822b40d09a ???
    ???:0
0x5591bdaece99 ???
    ???:0
0xffffffffffffffff ???
    ???:0
lightning_connectd: FATAL SIGNAL (version v0.11.1)
0x5591bdafdbe5 send_backtrace
    common/daemon.c:33
0x5591bdb066b2 status_failed
    common/status.c:221
0x5591bdb067aa status_backtrace_exit
    common/subdaemon.c:18
0x5591bdafdc75 crashdump
    common/daemon.c:49
0x7f822b62672f ???
    ???:0
0x7f822b4207bb ???
    ???:0
0x7f822b40b534 ???
    ???:0
0x7f822b40b40e ???
    ???:0
0x7f822b419101 ???
    ???:0
0x5591bdb3ba23 del_tree
    ccan/ccan/tal/tal.c:393
0x5591bdb3ba3f del_tree
    ccan/ccan/tal/tal.c:412
0x5591bdb3bf1e tal_free ccan/ccan/tal/tal.c:486
0x5591bdaf389c peer_reconnected
    connectd/connectd.c:259
0x5591bdaf3b96 peer_connected
    connectd/connectd.c:351
0x5591bdaf3f01 retry_peer_connected
    connectd/connectd.c:228
0x5591bdb30bd1 next_plan
    ccan/ccan/io/io.c:59
0x5591bdb30ff2 io_do_always
    ccan/ccan/io/io.c:435
0x5591bdb32494 handle_always
    ccan/ccan/io/poll.c:304
0x5591bdb327d0 io_loop
    ccan/ccan/io/poll.c:385
0x5591bdaf423e main
    connectd/connectd.c:2158
0x7f822b40d09a ???
    ???:0
0x5591bdaece99 ???
    ???:0
0xffffffffffffffff ???
    ???:0
whitslack commented 2 years ago

@zerofeerouting: See #5282 and #5284, and try applying this patch to fix the use-after-free crash bug in retry_peer_connected.

zerofeerouting commented 2 years ago

Thank you. Will look into it!

zerofeerouting commented 2 years ago

Seems to be the same issue described in #5284

zerofeerouting commented 2 years ago

Managed to get the node back up again (by sheer luck).

cdecker commented 2 years ago

Duplicates #5282