ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.84k stars 901 forks source link

FATAL SIGNAL 7 #5692

Closed flowolf closed 1 year ago

flowolf commented 1 year ago

I'm running a c-lightning instance on docker (with btcpayserver) on a raspberry pi 4, with SSD as storage.

I've charged the on-chain wallet and opened a lightning channel. The transaction for that was pending and when I came back to the instance all frontends reported a problem and the c-lightning docker container logs:

Lost connection to the RPC socket.
network=bitcoin added in /root/.lightning/config
Waiting /root/.nbxplorer/btc_fully_synched to be created...
The chain is fully synched
Waiting /var/lib/tor/hidden_services/c-lightning/hostname to be created by tor...
announce-addr=[xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx].onion:9735 added to /root/.lightning/config
rpc-file=/root/.lightning/lightning-rpc added to /root/.lightning/config
Installing bundled plugins
C-Lightning starting, listening on port 9735
2022-11-03T16:52:35.036Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
2022-11-03T16:52:35.911Z INFO    lightningd: --------------------------------------------------
2022-11-03T16:52:35.911Z INFO    lightningd: Server started with public key [redacted-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx], alias XXXXXXXXXXX (color #[ffffff]) and lightningd basedon-v0.10.2
plugin-invoicewithdescriptionhash  initialized plugin v1.2
lightning_gossipd: FATAL SIGNAL 7 (version basedon-v0.10.2)
lightning_gossipd: FATAL SIGNAL (version basedon-v0.10.2)
2022-11-03T16:52:36.989Z **BROKEN** gossipd: FATAL SIGNAL 7 (version basedon-v0.10.2)
2022-11-03T16:52:36.989Z **BROKEN** gossipd: FATAL SIGNAL (version basedon-v0.10.2)
2022-11-03T16:52:36.989Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: FATAL SIGNAL
2022-11-03T16:52:36.990Z UNUSUAL gossipd: Status closed, but waitpid 38 says No child processes
lightningd: gossipd failed (signal 127), exiting.
Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.

Where do I start to look for what could be wrong? thanks for your help

flowolf commented 1 year ago

just realizing that BTCpayserver came with the 0.10.2 version - which is currently 12 month behind a recent release.

so this might be fixed already.

flowolf commented 1 year ago

I tried running 0.12.1

ran into non-announcable addr issues, replaced my hostname with the IP, but got FATAL SIGNAL 7 again 😟

Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.network=bitcoin added in /root/.lightning/config
Waiting /root/.nbxplorer/btc_fully_synched to be created...
The chain is fully synched
Waiting /var/lib/tor/hidden_services/c-lightning/hostname to be created by tor...
announce-addr=[xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx].onion:9735 added to /root/.lightning/config
rpc-file=/root/.lightning/lightning-rpc added to /root/.lightning/config
Installing bundled plugins
C-Lightning starting, listening on port 9735
2022-11-03T20:48:11.270Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
plugin-invoicewithdescriptionhash  initialized plugin v1.2
2022-11-03T20:48:11.383Z INFO    lightningd: --------------------------------------------------
2022-11-03T20:48:11.383Z INFO    lightningd: Server started with public key [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx], alias XXXXXXXXXX (color #XXXXXX) and lightningd basedon-v0.12.1
2022-11-03T20:48:13.198Z INFO    [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx]-chan#1: Peer transient failure in CHANNELD_NORMAL: Disconnected
lightning_gossipd: FATAL SIGNAL 7 (version basedon-v0.12.1)
lightning_gossipd: FATAL SIGNAL (version basedon-v0.12.1)
2022-11-03T20:48:13.216Z **BROKEN** gossipd: FATAL SIGNAL 7 (version basedon-v0.12.1)
2022-11-03T20:48:13.216Z **BROKEN** gossipd: FATAL SIGNAL (version basedon-v0.12.1)
2022-11-03T20:48:13.216Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: FATAL SIGNAL
lightningd: gossipd failed (exit status 242), exiting.
Lost connection to the RPC socket.

I've tried deleting the gossip_store file which was a byte in size - but that did not help 🤷

rustyrussell commented 1 year ago

I haven't seen a SIGBUS in a long time! And there's no backtrace, making it hard to know what's going on. And it's in Docker so how do we run that under the debugger, or ask it for a core file.

IOW, I don't know how to debug this, nor have I seen it before. Maybe it's a BtcPayserver thing?

NicolasDorier commented 1 year ago

@flowolf our 0.12.1 isn't currently tested. But even for 0.10.2 it's the first time I see this.

Is there a way to activate stacktrace? We have also developer build images to activate more diagnostic stuff. I can guide flowolf on how to activate logs and all. Which kind of logs would you need @rustyrussell ?

Given this error, I guess there is some corrupt file used by gossipd (which we could probably delete)

whitslack commented 1 year ago

In my experience, SIGBUS on Linux usually happens when there's an I/O error while page faulting a file-backed VMA. Whereas a read syscall can return EIO to indicate the error, a simple memory fetch has no choice but to deliver a fatal signal whenever the file backing the requested memory address cannot be read.

Suggestion: check the kernel log.

flowolf commented 1 year ago

so I updated my kernel to Linux pi 5.10.103-v8+ #1529 SMP PREEMPT Tue Mar 8 12:26:46 GMT 2022 aarch64 GNU/Linux

and the error persists:

% docker logs -f btcpayserver_clightning_bitcoin
network=bitcoin added in /root/.lightning/config
Waiting /root/.nbxplorer/btc_fully_synched to be created...
The chain is fully synched
Waiting /var/lib/tor/hidden_services/c-lightning/hostname to be created by tor...
announce-addr=asdfasdfasdfxxxxxxxxxxxxxxxxxxxxxxx.onion:9735 added to /root/.lightning/config
rpc-file=/root/.lightning/lightning-rpc added to /root/.lightning/config
Installing bundled plugins
C-Lightning starting, listening on port 9735
2022-11-05T08:32:39.661Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
plugin-invoicewithdescriptionhash  initialized plugin v1.2
2022-11-05T08:32:43.641Z INFO    lightningd: --------------------------------------------------
2022-11-05T08:32:43.641Z INFO    lightningd: Server started with public key 02bxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, alias XXXXXXXXXX (color #xxyyzz) and lightningd basedon-v0.12.1
2022-11-05T08:32:46.007Z INFO    0372bd00e7c43efd583623d8fa688141b91e00d0ad65b29b630bf7188804da50aa-chan#1: Peer transient failure in CHANNELD_NORMAL: Disconnected
lightning_gossipd: FATAL SIGNAL 7 (version basedon-v0.12.1)
lightning_gossipd: FATAL SIGNAL (version basedon-v0.12.1)
2022-11-05T08:32:46.149Z **BROKEN** gossipd: FATAL SIGNAL 7 (version basedon-v0.12.1)
2022-11-05T08:32:46.149Z **BROKEN** gossipd: FATAL SIGNAL (version basedon-v0.12.1)
2022-11-05T08:32:46.150Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: FATAL SIGNAL
lightningd: gossipd failed (exit status 242), exiting.

the kernel logs at that time report:


% journalctl -k

Nov 05 09:32:44 pi kernel: br-04b3787f47a0: port 2(vethb35cec0) entered blocking state
Nov 05 09:32:44 pi kernel: br-04b3787f47a0: port 2(vethb35cec0) entered forwarding state
Nov 05 09:32:44 pi kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethb35cec0: link becomes ready
Nov 05 09:32:45 pi kernel: br-04b3787f47a0: port 2(vethb35cec0) entered disabled state
Nov 05 09:32:45 pi kernel: eth0: renamed from veth08a63ea
Nov 05 09:32:45 pi kernel: br-04b3787f47a0: port 2(vethb35cec0) entered blocking state
Nov 05 09:32:45 pi kernel: br-04b3787f47a0: port 2(vethb35cec0) entered forwarding state
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(veth269b1c2) entered disabled state
Nov 05 09:32:46 pi kernel: veth80022b5: renamed from eth0
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(veth269b1c2) entered disabled state
Nov 05 09:32:46 pi kernel: device veth269b1c2 left promiscuous mode
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(veth269b1c2) entered disabled state
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered blocking state
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered disabled state
Nov 05 09:32:46 pi kernel: device vethff2cb16 entered promiscuous mode
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered blocking state
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered forwarding state
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered disabled state
Nov 05 09:32:46 pi kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethfb427ea: link becomes ready
Nov 05 09:32:46 pi kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethff2cb16: link becomes ready
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered blocking state
Nov 05 09:32:46 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered forwarding state
Nov 05 09:32:47 pi kernel: br-04b3787f47a0: port 10(veth6eff1ba) entered disabled state
Nov 05 09:32:47 pi kernel: veth7eb0914: renamed from eth0
Nov 05 09:32:47 pi kernel: br-04b3787f47a0: port 10(veth6eff1ba) entered disabled state
Nov 05 09:32:47 pi kernel: device veth6eff1ba left promiscuous mode
Nov 05 09:32:47 pi kernel: br-04b3787f47a0: port 10(veth6eff1ba) entered disabled state
Nov 05 09:32:47 pi kernel: br-04b3787f47a0: port 9(vethaf94d9e) entered blocking state
Nov 05 09:32:47 pi kernel: br-04b3787f47a0: port 9(vethaf94d9e) entered disabled state
Nov 05 09:32:47 pi kernel: device vethaf94d9e entered promiscuous mode
Nov 05 09:32:47 pi kernel: br-04b3787f47a0: port 9(vethaf94d9e) entered blocking state
Nov 05 09:32:47 pi kernel: br-04b3787f47a0: port 9(vethaf94d9e) entered forwarding state
Nov 05 09:32:47 pi kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethaf94d9e: link becomes ready
Nov 05 09:32:48 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered disabled state
Nov 05 09:32:48 pi kernel: eth0: renamed from vethfb427ea
Nov 05 09:32:48 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered blocking state
Nov 05 09:32:48 pi kernel: br-04b3787f47a0: port 8(vethff2cb16) entered forwarding state
Nov 05 09:32:49 pi kernel: br-04b3787f47a0: port 9(vethaf94d9e) entered disabled state
Nov 05 09:32:49 pi kernel: eth0: renamed from vethf9d7536
Nov 05 09:32:49 pi kernel: br-04b3787f47a0: port 9(vethaf94d9e) entered blocking state
Nov 05 09:32:49 pi kernel: br-04b3787f47a0: port 9(vethaf94d9e) entered forwarding state
Nov 05 09:32:49 pi kernel: br-04b3787f47a0: port 10(veth04d835b) entered blocking state
Nov 05 09:32:49 pi kernel: br-04b3787f47a0: port 10(veth04d835b) entered disabled state
Nov 05 09:32:49 pi kernel: device veth04d835b entered promiscuous mode
Nov 05 09:32:49 pi kernel: br-04b3787f47a0: port 10(veth04d835b) entered blocking state
Nov 05 09:32:49 pi kernel: br-04b3787f47a0: port 10(veth04d835b) entered forwarding state
Nov 05 09:32:49 pi kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth04d835b: link becomes ready
Nov 05 09:32:50 pi kernel: br-04b3787f47a0: port 10(veth04d835b) entered disabled state

@NicolasDorier thanks for offering your help. Can I get you any more data?

flowolf commented 1 year ago

I restarted the server again with 0.10.2 from a backup state. It did not immediately crash.

I'm waiting and hoping it stays stable.

vincenzopalazzo commented 1 year ago

Is it fixed this with some magic? @flowolf

flowolf commented 1 year ago

did not occur again after a reset to an old saved state.

vincenzopalazzo commented 1 year ago

did not occur again after a reset to an old saved state.

I do not know what old saved state means but happy that works, I'm closing it but feel free to reopen it