Closed nakoshi-satamoto closed 5 months ago
My node ID is 038e5677dc7d1ec4e9a82bd004d524b2469fedb0867e85bd1e2f5d3352957e03b7 alias nakoshi-satamoto@posteo.net
The first crashdump log file I am finding from today , crash.log.20240320221217 has ALOT of very large blocks of numbers after the entries jsonrpc#67IO_IN:
And then I get this in the end of the log file
+86.164710347 jsonrpc#67IO_IN: "recover:emergencyrecover#6/cln:emergencyrecover#44/chanbackup:recoverchannel#4"
+86.164748773 lightningdDEBUG: channel 6aee31f3fe0da7ed3c710bc94fb044be91917801a9fdec5eddf85f365d9f0b2b already exists!
+86.164757663 lightningdDEBUG: channel 27664dc53ff4ce83d77fae9809bc8773755da10b02090b54ab627316dff3a65b already exists!
+86.164764958 lightningdDEBUG: channel fbdedc8deebc257b0c9417537ff77c1d60b72029026c28977cd360fc33b8d3fd already exists!
+86.166025133 lightningdBROKEN: Error executing statement: wallet/wallet.c:2594: INSERT INTO channels ( peer_id, first_blocknum, id, revocation_basepoint_local, payment_basepoint_local, htlc_basepoint_local, delayed_payment_basepoint_local, funding_pubkey_local, require_confirm_inputs_remote, require_confirm_inputs_local) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?);: UNIQUE constraint failed: channels.id
+86.166130668 lightningdBROKEN: Error executing statement: wallet/wallet.c:2594: INSERT INTO channels ( peer_id, first_blocknum, id, revocation_basepoint_local, payment_basepoint_local, htlc_basepoint_local, delayed_payment_basepoint_local, funding_pubkey_local, require_confirm_inputs_remote, require_confirm_inputs_local) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?);: UNIQUE constraint failed: channels.id
+86.239775795 lightningdBROKEN: FATAL SIGNAL 6 (version v24.02)
+86.239831244 lightningdBROKEN: backtrace: common/daemon.c:38 (send_backtrace) 0x5646e19cafe6
+86.239847897 lightningdBROKEN: backtrace: common/daemon.c:75 (crashdump) 0x5646e19cb035
+86.239862803 lightningdBROKEN: backtrace: (null):0 ((null)) 0x7fee3bf7204f
+86.239878248 lightningdBROKEN: backtrace: (null):0 ((null)) 0x7fee3bfc0e2c
+86.239891614 lightningdBROKEN: backtrace: (null):0 ((null)) 0x7fee3bf71fb1
+86.239908035 lightningdBROKEN: backtrace: (null):0 ((null)) 0x7fee3bf5c471
+86.239922657 lightningdBROKEN: backtrace: lightningd/log.c:1025 (fatal_vfmt) 0x5646e1978c37
+86.239937494 lightningdBROKEN: backtrace: wallet/db.c:1097 (db_error) 0x5646e19aae55
+86.239953172 lightningdBROKEN: backtrace: db/utils.c:316 (db_fatal) 0x5646e1a19efd
+86.239967634 lightningdBROKEN: backtrace: db/utils.c:190 (db_exec_prepared_v2) 0x5646e1a1a2c1
+86.239983621 lightningdBROKEN: backtrace: wallet/wallet.c:2619 (wallet_channel_insert) 0x5646e19b524b
+86.240001919 lightningdBROKEN: backtrace: lightningd/opening_control.c:1581 (json_recoverchannel) 0x5646e197fd17
+86.240019036 lightningdBROKEN: backtrace: lightningd/jsonrpc.c:839 (command_exec) 0x5646e1970e0c
+86.240035240 lightningdBROKEN: backtrace: lightningd/jsonrpc.c:982 (rpc_command_hook_final) 0x5646e197398b
+86.240051331 lightningdBROKEN: backtrace: lightningd/plugin_hook.c:247 (plugin_hook_call_) 0x5646e19a0162
+86.240067735 lightningdBROKEN: backtrace: lightningd/jsonrpc.c:1070 (plugin_hook_call_rpc_command) 0x5646e1973174
+86.240085007 lightningdBROKEN: backtrace: lightningd/jsonrpc.c:1189 (parse_request) 0x5646e1973174
+86.240103772 lightningdBROKEN: backtrace: lightningd/jsonrpc.c:1295 (read_json) 0x5646e197359e
+86.240118341 lightningdBROKEN: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x5646e1b3b557
+86.240132772 lightningdBROKEN: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x5646e1b3ba32
+86.240148980 lightningdBROKEN: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x5646e1b3bacf
+86.240164578 lightningdBROKEN: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x5646e1b3d46c
+86.240179043 lightningdBROKEN: backtrace: lightningd/io_loop_with_timers.c:22 (io_loop_with_timers) 0x5646e1970d88
+86.240195353 lightningdBROKEN: backtrace: lightningd/lightningd.c:1425 (main) 0x5646e1976456
+86.240211916 lightningdBROKEN: backtrace: (null):0 ((null)) 0x7fee3bf5d249
+86.240225728 lightningdBROKEN: backtrace: (null):0 ((null)) 0x7fee3bf5d304
+86.240238765 lightningdBROKEN: backtrace: (null):0 ((null)) 0x5646e194ad64
+86.240251401 lightningdBROKEN: backtrace: (null):0 ((null)) 0xffffffffffffffff
greping my log file for that channel that forced closed (Boltz) grep 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29 ln-log.txt
I see this
024-03-20T01:46:34.941Z INFO 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208) 2024-03-20T01:49:39.948Z INFO 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208) 2024-03-20T01:53:54.907Z INFO 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208) 2024-03-20T03:42:17.419Z INFO 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (9) 2024-03-20T05:12:46.161Z INFO 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (9) 2024-03-20T22:12:15.330Z UNUSUAL 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Peer permanent failure in CHANNELD_NORMAL: channeld: sent Awaiting unilateral close 2024-03-20T22:12:15.330Z **BROKEN** 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Cannot broadcast our commitment tx: they have a future one 2024-03-20T22:12:15.330Z INFO 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL 2024-03-20T22:12:15.508Z **BROKEN** 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: gossipd rejected our channel update: Bad signature for [REDACTED THE DATA HERE FROM THIS LOG FILE IN CASE IF THIS WAS SENSITIVE INFO] 2024-03-20T22:15:19.894Z UNUSUAL 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: gossipd lost track of announced channel: re-announcing! 2024-03-20T22:15:19.968Z **BROKEN** 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Cannot broadcast our commitment tx: they have a future one 2024-03-20T22:18:03.199Z UNUSUAL 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: gossipd lost track of announced channel: re-announcing! 2024-03-20T22:18:03.279Z DEBUG 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Adding funding_spend_watch 2024-03-20T22:18:03.279Z **BROKEN** 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Cannot broadcast our commitment tx: they have a future one 2024-03-20T22:18:03.280Z DEBUG 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Watching for funding txid: 86ef7d15b6117e07537a7cadf26dd17d48e79c4c5cbd1ca4a0944d3566a25355 2024-03-20T22:19:46.909Z UNUSUAL 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: gossipd lost track of announced channel: re-announcing! 2024-03-20T22:19:46.981Z DEBUG 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Adding funding_spend_watch 2024-03-20T22:19:46.981Z **BROKEN** 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Cannot broadcast our commitment tx: they have a future one 2024-03-20T22:19:46.982Z DEBUG 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Watching for funding txid: 86ef7d15b6117e07537a7cadf26dd17d48e79c4c5cbd1ca4a0944d3566a25355 2024-03-20T22:20:52.864Z DEBUG 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Adding funding_spend_watch 2024-03-20T22:20:52.864Z **BROKEN** 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Cannot broadcast our commitment tx: they have a future one 2024-03-20T22:20:52.865Z DEBUG 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Watching for funding txid: 86ef7d15b6117e07537a7cadf26dd17d48e79c4c5cbd1ca4a0944d3566a25355 2024-03-20T22:28:05.525Z DEBUG 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2-chan#29: Adding funding_spend_watch
Gossipd, channel with a lost state.
This is what stood out to me. I also don't know yet what caused the forced close. Maybe this is why lightningd won't start up?
Did I lose my funds?
Ping @adi2011: this appears to be an issue with the recovery plugin, attempting to recover a channel as if it were fully forgotten, but the CLN node "just" lost state.
Generally speaking: if a node does not start, DO NOT tinker with the database unless told to. This includes up and downgrades.
Don't worry the funds should be save, and we just need to get the node running again. I'll let @adi2011 guide you through the steps, but I'll note that we can get you unstuck by adding --disable-plugin=recovery
to the command line ;-)
If you could take a snapshot of the DB (not the hsm_secret
file), we can get you unstuck while still retaining the ability to debug offline and fix it for other users too.
As for things currently in flight, only channel 29 is at risk of closure, as it is the only one with an HTLC attached.
Thank you cdecker. High respect. I am at computer for the rest of the day. I have made a snapshot of my .lightning\bitcoin directory, I also started it up with the command .\lightningd --disable-plugin=recover (not recovery).
After I get this resolved, I might continue operating a clightning node having peace of mind knowing what to do next time if this happens again.
For that forced close chan 831793x2527x0 I see this now
{ "timestamp": "2024-03-20T22:12:15.335Z", "old_state": "CHANNELD_NORMAL", "new_state": "AWAITING_UNILATERAL", "cause": "local", "message": "channeld: sent Awaiting unilateral close"
and then an htcl direction in with state "RCVD_REMOVE_REVOCATION"
Just let me know what I should check or try out now if you could. I'll follow your instructions. Thank you very much.
I stopped my lightning node now to keep it offline, to await further instruction @cdecker @adi2011
For that forced close chan I do see cause set to "local" does that indicate that it was my node that initiated the forced close? So strange. Anyways, I am able to start up lightningd but only when the recover plugin is set to disabled. I have also since been using the --offline flag as well in caution.
I think that already means we're past the problem, it was just the recover plugin crashing due to a misidentified missing channel. You should be able to just run the node normally now.
What do you suggest? When I attempt to start it up normally again it crashes again like it did earlier. I can only start it up when I set recover plugin to disabled.
I do have a backup of accounts.sqlite3 and emergency.recover from before these issues, if that is helpful at all.
email from boltz:
We force closed you:
2024-03-20 22:12:13.495 [ERR] HSWC: ChannelLink([REDACTED INFO]:0): failing link: unable to synchronize channel states: possible remote commitment state data loss with error: sync error
2024-03-20 22:12:13.495 [WRN] HSWC: ChannelLink([REDACTED INFO]:0): error when syncing channel states: possible remote commitment state data loss
2024-03-20 22:12:13.495 [ERR] LNWL: ChannelPoint([REDACTED INFO]:0): sync failed: remote's next commit height is 65391, while we believe it is 65392!
When looking at the close transaction (https://mempool.space/tx/e6df69daa0eefd910d871601bbf0e78e838b6396c79601a28dbbf31bb8312591) and our nodes view, we'll get ~3.2m and you ~8.3m sats. It's force close, not a penalty, so you are all good. We'll just need to wait ~112 blocks and the funds will be back in our wallet.
Let me know if the CLN team needs our logs for that channel to debug your crash.
I replaced parts with [REDACTED INFO]
When I run ./lightning-cli listpeers I see "lost_state": true
only for that channel that forced closed. The rest show false for lost state.
Am I able to recover my funds from this unilateral close, even with this lost channel state and inability to startup lightningd unless I disable the recover plugin?
After I recover the funds, any ideas on how I can restore functionality? I think it is this one channel that is breaking the recover plugin from operating normally.
Update
TL;DR summary = My lightning router eventually forgot this channel that was forced closed, in which I was then able to start up lightningd normally again. I still need help recovering funds from the unilateral close transaction
I started up lightningd with the recover plugin disabled
I then reconnected to the peer that forced closed the chan (boltz).
I then saw my 8.3 million sats from the forced channel closure listed as "onchain". But I cannot access these funds because I think lightning needs to claim it. I did load my nodes private keys into sparrow wallet with proper derivation path and I do not see funds in there. From my experience with unilateral closures lightning needs to see the unilateral close transaction t then execute some script to sweep the funds to one of its onchain addresses.
I restarted lightningd again with the recover plugin disabled and let it catch up with the blockchain.
After catching up with the blockchain I restarted lightningd again with the recover plugin disabled, I then saw that the problem channel (that forced closed) was forgotten by lightning.
I restarted lightningd again normally (not disabling the recover plugin). That problem channel is indeed forgotten by my node thus enabling lightningd to start up normally again.
However what remains, I need to recover my sats from that forced channel closure. I see this when I type ./lightning-cli listfunds
{ "txid": "e6df69daa0eefd910d871601bbf0e78e838b6396c79601a28dbbf31bb8312591", "output": 1, "amount_msat": 8387232000, "scriptpubkey": "[REDACTED INFO]", "address": "bc1q8s5ay4093kwx9ln0yntcckuapxnp3ngmr5l6s3", "status": "confirmed", "blockheight": 835578, "reserved": false }
I replaced the scrippubkey with REDACTED INFO in case if that is sensitive information.
How do I claim these though from that unilateral close? I do not see these funds in my wallet. Will lightning sweep these funds after some delayed output? Or are these funds lost because lightning lost the state of this channel? Is the delayed output the peers delayed output of I think 144 blocks or my delayed output of what I think was around 1400 blocks?
I was able to withdraw all funds including the ones from the forced channel closure by using the command ./lightning-cli withdraw [destination] all urgent
Not sure why it was not showing up in sparrow wallet, but my funds are returned. Core Lightning remains solid!
If anyone experiences an issue like I did, the solution is to start up the node with the recover plugin disabled, use the listpeers command to look at the lost_state field to see which channel is causing the problems, reconnect to that node if it is disconnected. Restart lightningd again with the recover plugin disabled, let it catch up with the blockchain. Restart lightningd again with the recover plugin disabled to see if that problem channel is corrected or forgotten. Once the channel state is corrected or forgotten you should then be able to restart lightningd again normally without having to disable the recover plugin. All these steps might be unnecessary, I just am documenting the steps I took to resolve.
Now, why this chan forced closed I don't know. Maybe it was because my computer crashed and something goofed up somewhere. You can close this out, or if someone wants to keep this open to investigate further I will co-operate in investigating what went wrong if needed. Thanks
Am I able to recover my funds from this unilateral close, even with this lost channel state and inability to startup lightningd unless I disable the recover plugin?
Yes, CLN supports the option_upfront_shutdown
which specifies a refund address from your wallet. The spec used to mandate rotating keys, but we have since removed that requirement, exactly due to the loss potential here.
We also support option_dataloss_protection
, which will ask the peer to close the channel if we detect that we lost data (which sounds like your scenario).
After I recover the funds, any ideas on how I can restore functionality? I think it is this one channel that is breaking the recover plugin from operating normally.
You can always start without the recover
plugin until the underlying crash issue has been addressed. After a couple of days you should forget the channel, and you can re-enable the recover
plugin without risking triggering it again afaik.
I rephrased the issue now that the funds are back, and we want to address the underlying issue.
So the issue is that we were trying to insert a channel (in the db) at the same 'id' which is already preoccupied, and all this happens while running the rpc recoverchannel
.
We have a check inside recoverchannel
to see if the channel already exists in the DB using find_channel_by_id(peer, &cid)
and when this returns true, it logs "channel < ChannelId > already exists!" which it did for 3 channels but not this one.
Hence the possible reasons for this could be:
That the Database had a different channel at the same index.
For some reason find_channel_by_id(peer, &cid)
could not detect the channel because it was not loaded inside peer
struct.
@adi2011 I have not done anything with the emergency.recover file.
Also, this happened again today. Last time and this time it happened it was when my computer turned off from a system crash or froze up forcing me to do a hard reboot. I have noticed that my computer only has these issues of crashing when I run elementsqt, which is the gui for liquid the bitcoin sidechain, elementsd (non GUI) works fine.
Today I found my computer was turned off, probably from a system crash. When I started it up again, I let bitcoin core fully sync to latest block then I started up lightningd and these same issues occured. I also noticed that this also caused another channel to unilaterally close on me. Both last time and this time when my computer crashed it caused a channel to unilaterlaly close and the lightningd to not start up unless I disable the recover plugin. This is the second time in a few days where this has happened and my node having a channel die as a result, and where I can only start with the recover plugin disabled. The last time I was able to start it normally after that killed channel was forgotten.
It might also be helpful to know, that these issues started with version 24.02, the first case documented in this issue was on version 24.02, the second incident, the one from today is on version 24.02.01. This is also the first time I've seen channels unilaterally close when my system crashes. In the past before version 24 when my system crashes I am able to start up lightningd normally and with retaining channels.
Steps taken:
I plan to see if lightningd starts up again normally later once that chan is forgotten.
In summary: So for some reason, a system crash is causing a channel to unilaterally close and when I start up lightningd after a unilateral close the recover plugin freaks out and lightningd crashes. Some questions:
Adding to my last post, I checked the sha256 hash of my emergency.recover file and it is the same as the one in my backup from before issues, so that verifies that the emergency.recover file was not changed since my last backup.
After the channel was forgotten by lightning, I was able to start it up normally without needing to disable the recover plugin
Hi @nakoshi-satamoto, I think the issue is because peer->channels
doesn't hold closed channels and due to this the recover plugin is trying to stub a channel inside the db at an already preoccupied space, which leads it to crashing.
Answering to your questions:
is it safe for me to run lightningd with the recover plugin disabled? => In your case it might be a good idea to disable recover plugin otherwise it'd try to stub channels at already closed channel positions.
does lightningd resync with recent blocks every time it is started up? If so maybe it is during this period that the recover plugin sees things not up to date and crashes before it has a chance for blocks to catch up. => The problem (from what i could figure looking at the logs) is because your node entered the recovery mode and it tries to recover the node through the emergency.recover file
Theory, maybe recover plugin is also crashing until the problem channel is forgotten? I'll test this later on today and can report back. => Once the channel is forgotten it gets removed and everything works fine.
BTW is it still throwing **BROKEN** lightningd: Error executing statement: wallet/wallet.c:2594: INSERT INTO channels ( peer_id, first_blocknum, id, revocation_basepoint_local, payment_basepoint_local, htlc_basepoint_local, delayed_payment_basepoint_local, funding_pubkey_local, require_confirm_inputs_remote, require_confirm_inputs_local) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?);: UNIQUE constraint failed: channels.id
while crashing?
I was travelling pardon my delayed response. I have not had that issue ever since I did the steps I described above. It starts up normally now and without needing to disable the recover plugin.
It means, your DB forgot the channel which was having unique ID constraints. Do you have the snapshot of the DB from the point when it was throwing the error?
This should be resolved now... @nakoshi-satamoto Pls do lmk if the DB error still persists.
New description
The
recover
plugin appears to crash if we had dataloss, but the database still contains a trace of the channel, causing the insert to fail during the recovery.Original issue
Issue and Steps to Reproduce
My system to froze up and was non-responsive. I had to do a forced reboot of the computer. lightningd won't even start up now. I'm on version 24.02, I even tried reverting back to the old version 23.11.2 and still no luck with starting up this node. I even re-extracted clightning-v24.02-Ubuntu-22.04.tar.xz and tried running v24.02 again and yet no luck. I cannot run the newer version 24.0.2.1 because that is not signed with any valid known signature. So I am on 24.0.2 as the latest version. Yes I was mirroring the sqlite database but that is pointless if the mirror gets corrupted also, if this is a case of database corruption.
Below is the output I get when running in debug mode.
While trying to keep starting up core lightning I noticed one of my channels with alot of funds got forced closed. The chan that got forced closed was 831793x2527x0 (boltz) with closure transaction ID being e6df69daa0eefd910d871601bbf0e78e838b6396c79601a28dbbf31bb8312591
I am scared that these funds have been lost forever, if this was a penalty closure, or if this was a force closure and my lightning node lost the information needed to claim the funds after the wait time.
At this point, I'm done with lightning. Too many issues over my time with running core lightning. I just want to recover my funds. I'm okay if I have to force close all my chans if it means I get my funds back.
1. How can I get core lightning working again, or recover my funds?
2. Assuming the database is corrupted and that chan was indeed forced closed, am I able to claim the funds from the forced close script after the wait time?
Output from lightningd in debug mode