ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.83k stars 896 forks source link

Transient errors on v23.02 #6065

Open Zk2u opened 1 year ago

Zk2u commented 1 year ago

Issue and Steps to Reproduce

Having numerous connection issues with my peers (5 force closes in last 2 days) on 23.02. Some logs from a boot up:

2023-03-05T17:55:51.084Z INFO    connectd: Static Tor service onion address: "n3qxevdv5ea2eh4wqvlskv2e2c5xaam5ldhxq7slikyw3sp25xu4yrqd.onion:9735,[::]:9735" bound from extern port 9735
2023-03-05T17:55:51.103Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
2023-03-05T17:55:55.292Z INFO    plugin-clboss: clboss 0.13A
2023-03-05T17:55:55.292Z INFO    plugin-clboss: PeerComplaintsDesk: Auto-close: enabled.
2023-03-05T17:55:55.293Z INFO    plugin-clboss: AmountSettingsHandler: Minimum channel size set by --clboss-min-channel to 1500000 satoshis.
2023-03-05T17:55:55.293Z INFO    plugin-clboss: ChannelFeeManager: zerobasefee: require
2023-03-05T17:55:55.431Z INFO    plugin-peerswap-plugin: PeerSwap Initialized, running PeerSwap commit 4250c5c33031a12ab51ce4a48ec18ab5a90747c7
2023-03-05T17:55:55.500Z INFO    plugin-peerswap-plugin: Killing plugin: exited during normal operation
2023-03-05T17:55:55.533Z INFO    plugin-clboss: InternetConnectionMonitor: online.
2023-03-05T17:55:55.533Z INFO    plugin-clboss: Started.
2023-03-05T17:55:55.548Z INFO    plugin-clboss: New block at 779455
2023-03-05T17:55:55.836Z INFO    plugin-clboss: InternetConnectionMonitor: online.
2023-03-05T17:55:56.338Z INFO    plugin-clboss: Connector: connected to 029dd81c32e3ea839b4b2f8edcc9bea227c40fa2c6509878034401a8ce2a8bac05@176.36.37.21:9735
2023-03-05T17:55:56.338Z INFO    plugin-clboss: NeedsConnectSolicitor: Connection solicited to: 029dd81c32e3ea839b4b2f8edcc9bea227c40fa2c6509878034401a8ce2a8bac05@176.36.37.21:9735
2023-03-05T17:55:56.680Z INFO    lightningd: Restarting onchaind for channel 27
2023-03-05T17:55:56.681Z UNUSUAL 035554021b9674f2f345539abdf7d1c148c3e2829fe4e186549b6222932f8f4ead-chan#27: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-03-05T17:55:56.682Z INFO    035554021b9674f2f345539abdf7d1c148c3e2829fe4e186549b6222932f8f4ead-chan#27: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-03-05T17:55:56.711Z INFO    lightningd: Restarting onchaind for channel 30
2023-03-05T17:55:56.712Z UNUSUAL 025a4172f1a318e0db638c6fc14d75aadb43c28347036725f08023c63d5a7ac01a-chan#30: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-03-05T17:55:56.713Z INFO    025a4172f1a318e0db638c6fc14d75aadb43c28347036725f08023c63d5a7ac01a-chan#30: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-03-05T17:55:56.734Z INFO    lightningd: Restarting onchaind for channel 57
2023-03-05T17:55:56.735Z UNUSUAL 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#57: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-03-05T17:55:56.736Z INFO    026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#57: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-03-05T17:55:56.760Z INFO    lightningd: --------------------------------------------------
2023-03-05T17:55:56.761Z INFO    lightningd: Server started with public key 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165, alias Ares Inferno (color #022f2f) and lightningd v23.02
2023-03-05T17:55:56.761Z INFO    plugin-clboss: DnsSeed: Have 2 seeds.
2023-03-05T17:55:56.768Z INFO    025a4172f1a318e0db638c6fc14d75aadb43c28347036725f08023c63d5a7ac01a-chan#30: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-03-05T17:55:56.785Z INFO    035554021b9674f2f345539abdf7d1c148c3e2829fe4e186549b6222932f8f4ead-chan#27: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-03-05T17:55:56.796Z INFO    026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#57: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-03-05T17:55:57.315Z INFO    plugin-clboss: Connector: connected to 0298f6074a454a1f5345cb2a7c6f9fce206cd0bf675d177cdbf0ca7508dd28852f@73.114.40.246:9735
2023-03-05T17:55:58.394Z INFO    030a58b8653d32b99200a2334cfe913e51dc7d155aa0116c176657a4f1722677a3-chan#66: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T17:55:58.624Z UNUSUAL 02875ac2c27835990ef62e5755c34264b2c39f51a41525adc5e52a7f94b3a19f8b-chan#5: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-03-05T17:55:58.626Z INFO    02875ac2c27835990ef62e5755c34264b2c39f51a41525adc5e52a7f94b3a19f8b-chan#5: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-03-05T17:55:58.690Z INFO    02875ac2c27835990ef62e5755c34264b2c39f51a41525adc5e52a7f94b3a19f8b-chan#5: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-03-05T17:55:59.931Z INFO    03d607f3e69fd032524a867b288216bfab263b6eaee4e07783799a6fe69bb84fac-chan#6: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T17:56:01.965Z INFO    020d1617e27ac022395352f2b3774969593d3d6ddff6fb117d820a9dda8da45217-chan#65: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T17:56:03.509Z INFO    024bfaf0cabe7f874fd33ebf7c6f4e5385971fc504ef3f492432e9e3ec77e1b5cf-chan#36: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T17:56:05.062Z INFO    plugin-clboss: FeeModderBySize: Peer 02fa797156b0ea34757bad50cd8525a28dfb4f9f8f183c6c3955f7f8d0fa55d262 has 2 other peers, 1 of which have less capacity than us, 1 have more.  Multiplier: 1
2023-03-05T17:56:05.062Z INFO    plugin-clboss: FeeModderBySize: Peer 024bfaf0cabe7f874fd33ebf7c6f4e5385971fc504ef3f492432e9e3ec77e1b5cf has 1154 other peers, 651 of which have less capacity than us, 503 have more.  Multiplier: 1.57605
2023-03-05T17:56:05.063Z INFO    plugin-clboss: FeeModderBySize: Peer 023d507f823c60de463d5b500bae31418d363afbf29aba642e8571c16ae19988b3 has 1 other peers, 1 of which have less capacity than us, 0 have more.  Multiplier: 16
2023-03-05T17:56:05.063Z INFO    plugin-clboss: FeeModderBySize: Peer 0279f06eba0e1080f6a693201f090d0635a0e5dd2ef57d0207210e3d338133092e has 38 other peers, 19 of which have less capacity than us, 19 have more.  Multiplier: 1
2023-03-05T17:56:05.064Z INFO    plugin-clboss: FeeModderBySize: Peer 030a58b8653d32b99200a2334cfe913e51dc7d155aa0116c176657a4f1722677a3 has 115 other peers, 51 of which have less capacity than us, 64 have more.  Multiplier: 0.934877
2023-03-05T17:56:05.064Z INFO    plugin-clboss: FeeModderBySize: Peer 037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3 has 141 other peers, 17 of which have less capacity than us, 124 have more.  Multiplier: 0.604869
2023-03-05T17:56:05.064Z INFO    plugin-clboss: FeeModderBySize: Peer 020d1617e27ac022395352f2b3774969593d3d6ddff6fb117d820a9dda8da45217 has 319 other peers, 46 of which have less capacity than us, 273 have more.  Multiplier: 0.626595
2023-03-05T17:56:05.064Z INFO    plugin-clboss: FeeModderBySize: Peer 03d607f3e69fd032524a867b288216bfab263b6eaee4e07783799a6fe69bb84fac has 416 other peers, 256 of which have less capacity than us, 160 have more.  Multiplier: 2.2436
2023-03-05T17:56:05.065Z INFO    plugin-clboss: FeeModderBySize: Peer 035e4ff418fc8b5554c5d9eea66396c227bd429a3251c8cbc711002ba215bfc226 has 2409 other peers, 1784 of which have less capacity than us, 625 have more.  Multiplier: 3.60118
2023-03-05T17:56:06.125Z INFO    plugin-clboss: New block at 779470
2023-03-05T17:56:09.270Z INFO    035e4ff418fc8b5554c5d9eea66396c227bd429a3251c8cbc711002ba215bfc226-chan#2: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T17:56:13.078Z INFO    037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T17:56:34.845Z INFO    023d507f823c60de463d5b500bae31418d363afbf29aba642e8571c16ae19988b3-chan#59: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T17:57:00.958Z INFO    0279f06eba0e1080f6a693201f090d0635a0e5dd2ef57d0207210e3d338133092e-chan#23: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T17:57:37.602Z INFO    02fa797156b0ea34757bad50cd8525a28dfb4f9f8f183c6c3955f7f8d0fa55d262-chan#51: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T17:58:15.581Z INFO    037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-05T18:01:06.770Z INFO    023d507f823c60de463d5b500bae31418d363afbf29aba642e8571c16ae19988b3-chan#59: Peer transient failure in CHANNELD_NORMAL: Disconnected

config

mainnet

bitcoin-rpcuser=***
bitcoin-rpcpassword=***
bitcoin-rpcconnect=***

log-file=/root/.lightning/debug.log

wallet=postgres://***
alias=Ares Inferno

fee-base=0
fee-per-satoshi=0
min-capacity-sat=1500000

commit-time=0

large-channels

# experimental features
experimental-onion-messages
experimental-offers
fetchinvoice-noconnect
experimental-shutdown-wrong-funding
experimental-dual-fund
experimental-peer-storage

proxy=127.0.0.1:9050
addr=
addr=statictor:127.0.0.1:9051/torblob=aresinferno

# dual funding
funder-policy=match
funder-policy-mod=100
lease-fee-base-msat=500sat
lease-fee-basis=50
channel-fee-max-base-msat=0sat
channel-fee-max-proportional-thousandths=0

getinfo output

{
   "id": "022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165",
   "alias": "Ares Inferno",
   "color": "022f2f",
   "num_peers": 18,
   "num_pending_channels": 0,
   "num_active_channels": 9,
   "num_inactive_channels": 4,
   "address": [
      {
         "type": "ipv4",
         "address": "94.237.53.252",
         "port": 9735
      },
      {
         "type": "ipv6",
         "address": "2a04:3541:1000:500:acec:c5ff:feb9:10d9",
         "port": 9735
      },
      {
         "type": "torv3",
         "address": "n3qxevdv5ea2eh4wqvlskv2e2c5xaam5ldhxq7slikyw3sp25xu4yrqd.onion",
         "port": 9735
      }
   ],
   "binding": [
      {
         "type": "ipv6",
         "address": "::",
         "port": 9735
      },
      {
         "type": "ipv4",
         "address": "0.0.0.0",
         "port": 9735
      }
   ],
   "version": "v23.02",
   "blockheight": 779470,
   "network": "bitcoin",
   "lightning-dir": "/root/.lightning/bitcoin",
   "our_features": {
      "init": "0200000000000008aa882a2a69a2",
      "node": "0200000000000088aa882a2a69a2",
      "channel": "",
      "invoice": "02000022024100"
   }
}
TrezorHannes commented 1 year ago

I'm one of the LND peers with numerous connection issues. We're only connected for like 2 minutes, then disconnected again. Providing logs from my end

Could it be that LND can't interpret the channel announcements from this CLN peer? FWIW, I have several CLN peers and no comparable issues.

Log entry for inbound=true

2023-03-05 17:53:01.170 [INF] SRVR: Finalizing connection to 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:52738, inbound=true
2023-03-05 17:53:01.210 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): loading ChannelPoint(6bb75a0064a26214e908462bf9a84e6780718171730e45a40bd3ff27441a55b9:0)
2023-03-05 17:53:01.211 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): Negotiated chan series queries
2023-03-05 17:53:01.211 [INF] DISC: Creating new GossipSyncer for peer=022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165
2023-03-05 17:53:01.533 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): unable to read message from peer: unable to parse message of unknown type: <unknown>
2023-03-05 17:53:49.934 [INF] DISC: GossipSyncer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): filtering chan range: start_height=527415, num_blocks=4096
2023-03-05 17:53:49.934 [INF] DISC: GossipSyncer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): sending final chan range chunk, size=138
2023-03-05 17:54:49.601 [INF] DISC: GossipSyncer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): fetching chan anns for 219 chans
2023-03-05 17:56:03.331 [INF] SRVR: Disconnecting from 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:52738
2023-03-05 17:56:03.331 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): disconnecting 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:52738, reason: server: DisconnectPeer called
2023-03-05 17:56:03.331 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): unable to read message from peer: read tcp 10.8.1.2:9735->10.8.1.1:52738: use of closed network connection
2023-03-05 17:56:03.336 [ERR] RPCS: [connectpeer]: error connecting to peer: already connected to peer: 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:52738
2023-03-05 17:56:03.336 [ERR] RPCS: [/lnrpc.Lightning/ConnectPeer]: already connected to peer: 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:52738
2023-03-05 17:56:03.533 [INF] DISC: Removing GossipSyncer for peer=022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165
2023-03-05 17:56:20.373 [INF] SRVR: Finalizing connection to 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:55906, inbound=true
2023-03-05 17:56:20.410 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): loading ChannelPoint(6bb75a0064a26214e908462bf9a84e6780718171730e45a40bd3ff27441a55b9:0)
2023-03-05 17:56:20.410 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): Negotiated chan series queries
2023-03-05 17:56:20.410 [INF] DISC: Creating new GossipSyncer for peer=022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165
2023-03-05 17:56:20.646 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): unable to read message from peer: unable to parse message of unknown type: <unknown>

Log entry for inbound=false, after a manual lncli disconnect and reconnect of the pubkey

admin ~ ฿ sudo tail -f .lnd/logs/bitcoin/mainnet/lnd.log | grep 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165
2023-03-05 18:03:27.000 [INF] SRVR: Finalizing connection to 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:60146, inbound=true
2023-03-05 18:03:27.039 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): loading ChannelPoint(6bb75a0064a26214e908462bf9a84e6780718171730e45a40bd3ff27441a55b9:0)
2023-03-05 18:03:27.040 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): Negotiated chan series queries
2023-03-05 18:03:27.040 [INF] DISC: Creating new GossipSyncer for peer=022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165
2023-03-05 18:03:27.352 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): unable to read message from peer: unable to parse message of unknown type: <unknown>
2023-03-05 18:03:39.964 [INF] SRVR: Disconnecting from 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:60146
2023-03-05 18:03:39.964 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): disconnecting 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:60146, reason: server: DisconnectPeer called
2023-03-05 18:03:39.964 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): unable to read message from peer: read tcp 10.8.1.2:9735->10.8.1.1:60146: use of closed network connection
2023-03-05 18:03:40.165 [INF] DISC: Removing GossipSyncer for peer=022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165
2023-03-05 18:05:28.091 [WRN] SRVR: Already have 1 persistent connection requests for 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:60146, connecting anyway.
2023-03-05 18:05:28.173 [INF] SRVR: Established connection to: 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@94.237.53.252:9735
2023-03-05 18:05:28.173 [INF] SRVR: Finalizing connection to 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@94.237.53.252:9735, inbound=false
2023-03-05 18:05:28.213 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): loading ChannelPoint(6bb75a0064a26214e908462bf9a84e6780718171730e45a40bd3ff27441a55b9:0)
2023-03-05 18:05:28.213 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): Negotiated chan series queries
2023-03-05 18:05:28.213 [INF] DISC: Creating new GossipSyncer for peer=022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165
2023-03-05 18:05:28.604 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): unable to read message from peer: unable to parse message of unknown type: <unknown>
2023-03-05 18:06:46.825 [INF] DISC: GossipSyncer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): filtering chan range: start_height=772291, num_blocks=2048
2023-03-05 18:06:46.830 [INF] DISC: GossipSyncer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): sending final chan range chunk, size=3308
2023-03-05 18:06:46.982 [INF] DISC: GossipSyncer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): fetching chan anns for 91 chans
2023-03-05 18:08:35.234 [INF] SRVR: Disconnecting from 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@94.237.53.252:9735
2023-03-05 18:08:35.234 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): disconnecting 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@94.237.53.252:9735, reason: server: DisconnectPeer called

Own Node details

admin ~ ฿ lncli getinfo
{
    "version": "0.15.5-beta commit=v0.15.5-beta",
    "commit_hash": "c0a09209782b1c62c3393fcea0844e095c25046b",
    "identity_pubkey": "037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3",
    "alias": "HODLmeTight",
Zk2u commented 1 year ago
2023-03-06T11:58:08.670Z INFO    037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (9)
2023-03-06T11:58:19.234Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-lightningd: Will try reconnect in 1 seconds
2023-03-06T11:58:22.753Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-connectd: Connected out, starting crypto
2023-03-06T11:58:23.753Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-connectd: Connect OUT
2023-03-06T11:58:24.124Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-connectd: peer_out WIRE_INIT
2023-03-06T11:58:24.836Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-connectd: Connect IN
2023-03-06T11:58:25.078Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-connectd: peer_out WIRE_INIT
2023-03-06T11:58:26.874Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-connectd: peer_in WIRE_INIT
2023-03-06T11:58:26.876Z INFO    037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-06T11:58:26.898Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-connectd: Handed peer, entering loop
2023-03-06T11:58:26.909Z DEBUG   lightningd: Looking for [chanbackup,peers,037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3]
2023-03-06T11:58:26.910Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: channel already active
2023-03-06T11:58:26.916Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Peer has reconnected, state CHANNELD_NORMAL: connecting subd
2023-03-06T11:58:26.940Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: pid 75165, msgfd 78
2023-03-06T11:58:26.940Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Already have funding locked in (and ready to announce)
2023-03-06T11:58:26.941Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: attempting update blockheight b9551a4427ffd30ba4450e7371817180674ea8f92b4608e91462a264005ab76b
2023-03-06T11:58:26.978Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: option_static_remotekey = 1, option_anchor_outputs = 0
2023-03-06T11:58:27.002Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: init LOCAL: remote_per_commit = 027c2ac6bea51cca7ccb884c76eba8a6502f3dca89c7c473181c653def0dd81e23, old_remote_per_commit = 0303faafb4a93cba676281111874268c0389e0687aec4da3a1a2e9279c2bb2c188 next_idx_local = 104065 next_idx_remote = 103576 revocations_received = 103574 feerates { SENT_ADD_COMMIT:2435 SENT_ADD_ACK_REVOCATION:5017 } range 253-29848 blockheights { SENT_ADD_ACK_REVOCATION:0 }, our current 779575
2023-03-06T11:58:27.004Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-connectd: peer_out WIRE_PEER_STORAGE
2023-03-06T11:58:27.006Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: peer_out WIRE_CHANNEL_REESTABLISH
2023-03-06T11:58:27.032Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: billboard: Sent reestablish, waiting for theirs
2023-03-06T11:58:27.038Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-hsmd: Got WIRE_HSMD_GET_PER_COMMITMENT_POINT
2023-03-06T11:58:27.066Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: peer_in WIRE_CHANNEL_REESTABLISH
2023-03-06T11:58:27.074Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-hsmd: Got WIRE_HSMD_GET_PER_COMMITMENT_POINT
2023-03-06T11:58:27.074Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: Got reestablish commit=103576 revoke=104064
2023-03-06T11:58:27.083Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: next_revocation_number = 104064
2023-03-06T11:58:27.086Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-hsmd: Got WIRE_HSMD_GET_PER_COMMITMENT_POINT
2023-03-06T11:58:27.087Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: option_static_remotekey: fields are correct
2023-03-06T11:58:27.092Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: No upgrade: no next_to_send received
2023-03-06T11:58:27.098Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-hsmd: Got WIRE_HSMD_CANNOUNCEMENT_SIG_REQ
2023-03-06T11:58:27.098Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: billboard perm: Reconnected, and reestablished.
2023-03-06T11:58:27.109Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: Exchanging announcement signatures.
2023-03-06T11:58:27.116Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: peer_out WIRE_ANNOUNCEMENT_SIGNATURES
2023-03-06T11:58:27.117Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: billboard: Channel ready for use. Channel announced.
2023-03-06T11:58:27.126Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: billboard: Channel ready for use. Channel announced.
2023-03-06T11:58:34.390Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Got depth change 0->18777 for 6bb75a0064a26214e908462bf9a84e6780718171730e45a40bd3ff27441a55b9
2023-03-06T11:58:34.390Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Funding tx 6bb75a0064a26214e908462bf9a84e6780718171730e45a40bd3ff27441a55b9 depth 18777 of 2
2023-03-06T11:58:34.391Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: attempting update blockheight b9551a4427ffd30ba4450e7371817180674ea8f92b4608e91462a264005ab76b
2023-03-06T11:58:55.353Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: Log pruned 38248 entries (mem 10485790 -> 5755271)
2023-03-06T11:59:57.397Z DEBUG   037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: Log pruned 38318 entries (mem 10485812 -> 5734405)
2023-03-06T11:59:58.933Z INFO    037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (256)
2023-03-06T12:00:11.391Z INFO    037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-06T12:06:08.387Z INFO    037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Peer transient failure in CHANNELD_NORMAL: Disconnected
rustyrussell commented 1 year ago

Hmm, looks like you turned on experimental-peer-storage? I think the message is upsetting LND?

Zk2u commented 1 year ago

Still getting this issue when I disable experimental-peer-storage. Running v23.02-6-gb5c6140.


2023-03-07T08:07:09.950Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
2023-03-07T08:07:12.524Z INFO    plugin-clboss: clboss 0.13A
2023-03-07T08:07:12.525Z INFO    plugin-clboss: PeerComplaintsDesk: Auto-close: enabled.
2023-03-07T08:07:12.525Z INFO    plugin-clboss: AmountSettingsHandler: Minimum channel size set by --clboss-min-channel to 1500000 satoshis.
2023-03-07T08:07:12.525Z INFO    plugin-clboss: ChannelFeeManager: zerobasefee: require
2023-03-07T08:07:12.716Z INFO    plugin-clboss: DnsSeed: Have 2 seeds.
2023-03-07T08:07:12.717Z INFO    plugin-clboss: Started.
2023-03-07T08:07:12.736Z INFO    plugin-clboss: New block at 779683
2023-03-07T08:07:12.746Z INFO    plugin-clboss: InternetConnectionMonitor: online.
2023-03-07T08:07:12.775Z INFO    plugin-clboss: ChannelCreationDecider: Onchain amount too low, add 0.0450942 or more before we create channels. Will not create channels yet.
2023-03-07T08:07:17.024Z UNUSUAL plugin-clrest.js: --- Starting the cl-rest server ---
2023-03-07T08:07:17.024Z UNUSUAL plugin-clrest.js: --- cl-rest api server is ready and listening on :::3003 ---
2023-03-07T08:07:17.024Z UNUSUAL plugin-clrest.js: --- cl-rest doc server is ready and listening on :::4001 ---
2023-03-07T08:07:17.027Z INFO    lightningd: Restarting onchaind for channel 5
2023-03-07T08:07:17.028Z UNUSUAL 02875ac2c27835990ef62e5755c34264b2c39f51a41525adc5e52a7f94b3a19f8b-chan#5: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-03-07T08:07:17.030Z INFO    02875ac2c27835990ef62e5755c34264b2c39f51a41525adc5e52a7f94b3a19f8b-chan#5: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-03-07T08:07:17.042Z INFO    lightningd: Restarting onchaind for channel 65
2023-03-07T08:07:17.044Z UNUSUAL 020d1617e27ac022395352f2b3774969593d3d6ddff6fb117d820a9dda8da45217-chan#65: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-03-07T08:07:17.045Z INFO    020d1617e27ac022395352f2b3774969593d3d6ddff6fb117d820a9dda8da45217-chan#65: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-03-07T08:07:17.057Z INFO    lightningd: Restarting onchaind for channel 66
2023-03-07T08:07:17.058Z UNUSUAL 030a58b8653d32b99200a2334cfe913e51dc7d155aa0116c176657a4f1722677a3-chan#66: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-03-07T08:07:17.059Z INFO    030a58b8653d32b99200a2334cfe913e51dc7d155aa0116c176657a4f1722677a3-chan#66: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-03-07T08:07:17.070Z INFO    lightningd: --------------------------------------------------
2023-03-07T08:07:17.070Z INFO    lightningd: Server started with public key 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165, alias ARES_ION (color #022f2f) and lightningd v23.02-6-gb5c6140
2023-03-07T08:07:17.114Z INFO    02875ac2c27835990ef62e5755c34264b2c39f51a41525adc5e52a7f94b3a19f8b-chan#5: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-03-07T08:07:17.156Z INFO    020d1617e27ac022395352f2b3774969593d3d6ddff6fb117d820a9dda8da45217-chan#65: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-03-07T08:07:17.164Z INFO    030a58b8653d32b99200a2334cfe913e51dc7d155aa0116c176657a4f1722677a3-chan#66: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-03-07T08:07:19.249Z INFO    035e4ff418fc8b5554c5d9eea66396c227bd429a3251c8cbc711002ba215bfc226-chan#2: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-07T08:07:19.285Z INFO    023d507f823c60de463d5b500bae31418d363afbf29aba642e8571c16ae19988b3-chan#59: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-07T08:07:19.629Z INFO    037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-chan#28: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-07T08:07:22.243Z INFO    024bfaf0cabe7f874fd33ebf7c6f4e5385971fc504ef3f492432e9e3ec77e1b5cf-chan#36: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-07T08:07:24.028Z INFO    plugin-clboss: New block at 779698
2023-03-07T08:07:24.051Z INFO    plugin-clboss: ChannelCreationDecider: Onchain amount too low, add 0.0284802 or more before we create channels. Will not create channels yet.
2023-03-07T08:07:24.557Z INFO    plugin-clboss: FeeModderBySize: Peer 0279f06eba0e1080f6a693201f090d0635a0e5dd2ef57d0207210e3d338133092e has 38 other peers, 19 of which have less capacity than us, 19 have more.  Multiplier: 1
2023-03-07T08:07:24.557Z INFO    plugin-clboss: FeeModderBySize: Peer 023d507f823c60de463d5b500bae31418d363afbf29aba642e8571c16ae19988b3 has 1 other peers, 1 of which have less capacity than us, 0 have more.  Multiplier: 16
2023-03-07T08:07:24.557Z INFO    plugin-clboss: FeeModderBySize: Peer 03d607f3e69fd032524a867b288216bfab263b6eaee4e07783799a6fe69bb84fac has 415 other peers, 232 of which have less capacity than us, 183 have more.  Multiplier: 1.57446
2023-03-07T08:07:24.557Z INFO    plugin-clboss: FeeModderBySize: Peer 02fa797156b0ea34757bad50cd8525a28dfb4f9f8f183c6c3955f7f8d0fa55d262 has 2 other peers, 1 of which have less capacity than us, 1 have more.  Multiplier: 1
2023-03-07T08:07:24.557Z INFO    plugin-clboss: FeeModderBySize: Peer 037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3 has 142 other peers, 13 of which have less capacity than us, 129 have more.  Multiplier: 0.578718
2023-03-07T08:07:24.559Z INFO    plugin-clboss: FeeModderBySize: Peer 024bfaf0cabe7f874fd33ebf7c6f4e5385971fc504ef3f492432e9e3ec77e1b5cf has 1151 other peers, 588 of which have less capacity than us, 563 have more.  Multiplier: 1.0884
2023-03-07T08:07:24.559Z INFO    plugin-clboss: FeeModderBySize: Peer 035e4ff418fc8b5554c5d9eea66396c227bd429a3251c8cbc711002ba215bfc226 has 2406 other peers, 1684 of which have less capacity than us, 722 have more.  Multiplier: 3.04891
2023-03-07T08:07:25.784Z INFO    03d607f3e69fd032524a867b288216bfab263b6eaee4e07783799a6fe69bb84fac-chan#6: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-07T08:07:33.387Z INFO    02fa797156b0ea34757bad50cd8525a28dfb4f9f8f183c6c3955f7f8d0fa55d262-chan#51: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-07T08:09:28.231Z INFO    plugin-clboss: Connector: failed to connect to 02241e1d480fcedaa1268574d530ef8ed5545bacdcd391e40ac58cbed30d001d40@ln1qgjpu82gpl8d4gfxs46d2v803m24gkavmnfereq2ckxta5cdqqw5qhk3acz.lseed.darosior.ninja.:9735
2023-03-07T08:09:30.296Z INFO    plugin-clboss: Connector: failed to connect to 02a209432db157f414e668dbce5ee0f986dedd7b9ecbbdb4cf9a47215e85f70d5c@ln1q23qjsedk9tlg98xdrduuhhqlxrdahtmnm9mmdx0nfrjzh597ux4c5radgg.lseed.darosior.ninja.:9735
2023-03-07T08:09:30.463Z INFO    plugin-clboss: Connector: failed to connect to 02b686ccf655ece9aec77d4d80f19bb9193f7ce224ab7c8bbe72feb3cdd7187e01@128.199.13.22:9735
2023-03-07T08:09:34.510Z INFO    0279f06eba0e1080f6a693201f090d0635a0e5dd2ef57d0207210e3d338133092e-chan#23: Peer transient failure in CHANNELD_NORMAL: Disconnected
2023-03-07T08:09:54.561Z INFO    plugin-clboss: New block at 779699
2023-03-07T08:09:54.596Z INFO    plugin-clboss: ChannelCreationDecider: Onchain amount too low, add 0.0284802 or more before we create channels. Will not create channels yet.
endothermicdev commented 1 year ago

I think the "Peer transient error..." on startup is a separate issue than LND's "unable to read message from peer: unable to parse message of unknown type: ." Disabling peerstorage should resolve that one, but we would need the LND logs to know for sure. Have you seen any more force-closes?

Curiously the transient error peer disconnection doesn't happen on regtest. Looking into this one, but it goes back to at least last release and doesn't seem to be a persistent issue after startup (correct me if it is!)

Zk2u commented 1 year ago

Node has been consistently up since ~9am UTC yesterday and I'm still getting transient connection issues, so not just startup AFAIK. I've had another two FC's since yesterday (peer storage has been disabled)

Zk2u commented 1 year ago

Have tried numerous times to make incoming/outgoing payments and all attempts have failed :(

$ cat debug.log | rg "killing connection"
2023-03-07T10:25:01.587Z UNUSUAL 03d607f3e69fd032524a867b288216bfab263b6eaee4e07783799a6fe69bb84fac-channeld-chan#6: Adding HTLC 18446744073709551615 too slow: killing connection
2023-03-07T10:25:31.845Z UNUSUAL 037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3-channeld-chan#28: Adding HTLC 18446744073709551615 too slow: killing connection
2023-03-07T15:04:05.448Z UNUSUAL 0279f06eba0e1080f6a693201f090d0635a0e5dd2ef57d0207210e3d338133092e-channeld-chan#23: Adding HTLC 18446744073709551615 too slow: killing connection
2023-03-07T19:04:06.114Z UNUSUAL 02fa797156b0ea34757bad50cd8525a28dfb4f9f8f183c6c3955f7f8d0fa55d262-channeld-chan#51: Adding HTLC 18446744073709551615 too slow: killing connection
2023-03-08T02:44:07.217Z UNUSUAL 035e4ff418fc8b5554c5d9eea66396c227bd429a3251c8cbc711002ba215bfc226-channeld-chan#2: Adding HTLC 18446744073709551615 too slow: killing connection
2023-03-08T04:44:07.468Z UNUSUAL 023d507f823c60de463d5b500bae31418d363afbf29aba642e8571c16ae19988b3-channeld-chan#59: Adding HTLC 18446744073709551615 too slow: killing connection
rustyrussell commented 1 year ago

This means your node is hella stressed. It's literally going 30 seconds without processing a simple response from channeld.

This shouldn't happen, but clboss can be pretty heavy. Maybe try disabling it and see if it settles down? What's your machine load like?

Roasbeef commented 1 year ago

@TrezorHannes thx for the handy logs! Looking at that, it doesn't appear that lnd disconnect due to the message itself. It logs that since it doesn't know of the message type, but it should chug along after that. I see a manual call to disconnect later down in the log lines. Also if you look after that log line, we process gossip messages as normal.

Re force closes: fwiw I've seen a few reports of a spike across the network. Partially isolated it to certain Tor connections dying, but nodes not realizing which causes HTLCs to timeout (tho that shouldn't propagate backwards!). If for w/e this is causing a disconnect, then if there're active HTLCs that can't be properly resolved off chain, the only option there is to unfortunately go on chain (tho maybe that's caused by node stress as mentioned above?).

Roasbeef commented 1 year ago

@TrezorHannes it looks like a manual disconnect came thru in that first set of logs?

2023-03-05 17:54:49.601 [INF] DISC: GossipSyncer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): fetching chan anns for 219 chans
2023-03-05 17:56:03.331 [INF] SRVR: Disconnecting from 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:52738
2023-03-05 17:56:03.331 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): disconnecting 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:52738, reason: server: DisconnectPeer called
2023-03-05 17:56:03.331 [INF] PEER: Peer(022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165): unable to read message from peer: read tcp 10.8.1.2:9735->10.8.1.1:52738: use of closed network connection
2023-03-05 17:56:03.336 [ERR] RPCS: [connectpeer]: error connecting to peer: already connected to peer: 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:52738

This is after that log line, then we process gossip, and get a disconnect in. Do you have some other software running that manages peer connections?

Later on you see lnd get another connection request, then try again even tho there's already a persistent connection:

2023-03-05 18:05:28.091 [WRN] SRVR: Already have 1 persistent connection requests for 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@10.8.1.1:60146, connecting anyway.
2023-03-05 18:05:28.173 [INF] SRVR: Established connection to: 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@94.237.53.252:9735
2023-03-05 18:05:28.173 [INF] SRVR: Finalizing connection to 022f2f7a120d04c28b7ec1ba7e100eac3e6b8388b0f184ac8ca5c32a737a3a4165@94.237.53.252:9735, inbound=false

Are you running lndg?

Zk2u commented 1 year ago

What's your machine load like?

About 7-8% currently, with spikes at 36%. Recently upgraded, so should not be having any perf issues 😅

Edit: do you think we can get this reopened? It appears not to be fixed in the latest commit

TrezorHannes commented 1 year ago

This is after that log line, then we process gossip, and get a disconnect in. Do you have some other software running that manages peer connections? Later on you see lnd get another connection request, then try again even tho there's already a persistent connection: Are you running lndg?

Hi @Roasbeef - thanks so much for looking into this with us. Yes, affirmative, I'm running LNDg, which has an automated reconnection job running. Do you think this is causing additional headaches?

@617a7a that's some beefy hardware. But 7-8% load on average seems high. Is clboss something which can be temporarily disabled?

Zk2u commented 1 year ago

@TrezorHannes I've just disabled CLBOSS while we debug this -- doubt it'll affect it however.

Roasbeef commented 1 year ago

@TrezorHannes

Yes, affirmative, I'm running LNDg, which has an automated reconnection job running. Do you think this is causing additional headaches?

Yeah I think so, we've had reports in the past of some weird reconnection logic run by lndg trigger unstable connections. If lndg is randomly disconnecting and connecting a connection (and that connection doesn't heal), then that may cause general channel instability, and potentially force closes if it isn't allowing lnd to manage connections itself. Tbh, I'm not sure what the purposes of that D/C job for lnd is...

Here's an old issue where we tracked down that the behavior was causing some issues: https://github.com/lightningnetwork/lnd/issues/6882#issuecomment-1236393152

So AFAICT, the root of this issue (at least the part about the cln <-> lnd connection not being stable) was actually lndg purposefully disconnecting in an attempt to have the channel show as "active". I don't know enough about cln to diagnose what the posted transient errors are though.

Zk2u commented 1 year ago

Worth saying that I got a force close yesterday with Swiss Routing who also runs CLN, so this might not just be a LND<->CLN compat issue.

Zk2u commented 1 year ago

Additional logs from today