Closed leilerg closed 3 years ago
Is there a gossip_store
that is valid?
Your filesystem might be broken. Could you check fsck
on the filesystem holding your lightningdir
?
There is a gossip_store
, don't know if valid... how can I tell? For me it's a ~1kb file, empty... cat gossip_store
returns nothing. It's timestamped at whenever I try running c-lightning.
There may be a problem with the filesystem... but I'm stuck in my attempts to resolve it. I'm running the RasPi with /
on an external hdd. Both bitcoind
and lightningd
are on the same disk, so cannot unmount to fsck
. Just running sudo fsck
gives me:
e2fsck 1.43.3 (04-Sep-2016)
/dev/mmcblk0p2: clean, 36796/913920 files, 303311/3877760 blocks
e2fsck 1.43.3 (04-Sep-2016)
/dev/sda1 is mounted.
e2fsck: Cannot continue, aborting.
fsck.fat 3.0.27 (2014-11-12)
0x41: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
1) Remove dirty bit
2) No action
?
If I press 1, I get:
Leaving filesystem unchanged.
/dev/mmcblk0p1: 146 files, 42839/83951 cluste
By going 2,
There are differences between boot sector and its backup.
This is mostly harmless. Differences: (offset:original/backup)
65:01/00
1) Copy original to backup
2) Copy backup to original
3) No action
and no matter what I do now, it gives me the same message above, after option 1 to remove dirty bit.
Tried running fsck
on boot via sudo touch /forcefsck
, and I think it runs (not sure cos' I'm headless...) but doesn't fix anything.
Running just sudo fsck /dev/sda1
throws a complain since it's mounted, and I cannot unmount it.
Guess the question now is how to force a file system check and fix on boot? I googled, but the suggestions are basically the ones I did above already. Any other ideas?
If it is truly empty, xxd gossip_store
will print nothing, otherwise it will hexdump the actual data in it. If the gossip_store
is, for example, filled with 0x00
bytes, cat
will appear to print nothing at all, but xxd
will reveal the truth (this can be used, incidentally, when tools mysteriously complain about your input file being bad when opening it in a text editor shows it is correctly formatted: I once found a weird case where a compiler kept complaining about invalid characters in a source code file, which no amount of staring at the code in a text editor revealed any problem, it turned out a Unicode byte-order mark had managed to get inserted to the start of the source code file, which the text editor removed because it knows Unicode, but which the compiler was absolutely flummoxed by).
You could try rm gossip_store
, or mv gossip_store gossip_store.back.20190923
, then rerun. If it persists, you may need to do more drastic filesystem checks with the filesystem unmounted, meaning booting on an alternate boot device, or extracting your boot device from the computer and mounting it on a reliable computer. fsck -ck
would scan for bad sectors, for example.
Learning new stuff with every comment... super appreciated! xxd gossip_store
give me
0000000: 07 .
including the odd .
at the end.
In the debug message above it was saying gossip store version 0 not 7
, but now it seems to be 7 to me. Maybe that's why I'm not getting the same debug info anymore?
I suspect you're right, will need to fsck
with the disk attached to my laptop... hopefully that does something!
This seems to be a valid gossip_store
actually, though maybe @rustyrussell can confirm. Is it still crashing afterwards? Maybe startup is just slow? It has to recover the gossip_store
by redownloading gossip from the network after all, so maybe slow startup only?
OK, looks like my drive is fairly corrupt overall. Suffers from some major problems, not just bad blocks... even as much as with the control board, doesn't seem to power up properly, and struggles with that. You can hear the plates trying to spin up, but struggling. Not sure why..
I did manage to power it on and save the contents of the .lightning
directory of my clightning user. Is this enough to restore the node on a different machine? If I just copy everything and rebuild/set up a new node, will this be enough?
You only really need the db file lightningd.sqlite3
and the private keys hsm_secret
, though do note that if your drive is in really bad shape both may already be corrupted.
Hi again, so I finally rebuilt my node and got clightning running as well, I connected to another node. I tried to import (i.e. copy/paste) the hsm_secret
and lightningd.sqlite3
files into my .lightning
but it didn't work. Error I got was
wallet_blocks_rollback: FOREIGN KEY constraint failed
Note, I did get clightning to run before doing it, I connected to another node successfully.
I then thought it may be a problem with the files (corrupted?), so I tried doing the same with two new files, the ones which were generated automatically by clightning. (I backed them up before deleting them.) Got the same message. I also deleted everything from .clightning
, just keep the two files, still same error.
Question now is, how do I restore my LN node with just the two files above?
How much did you lose?
Hi @leilerg, were you able to recover your database ? If not, have your channels been closed (funding transaction spent) ?
I did manage to set up another node, and tried to run it by simply copying lightningd.sqlite3
and hsm_secret
over, but that didn't seem to do the trick. It then dawned on me that just doing that could trigger a penalty channel close, so I didn't insist. Now I'm waiting for my peers to close the channels since the node is inactive, and has been inactive for ~1 month+ now. But doesn't seem like people are monitoring channels too actively...
Any suggestions how can I go about closing those channels? I found this to retrieve my funds, but the channels must be closed first, no? @ZmnSCPxj
That's exactly the problem why the "static channel backup" approach used elsewhere is not ok. Your peer could remain inactive forever and you would never recover your funds (because you don't have the most updated state).
Well, at this point I don't really have much of an option, so I'll wait. I didn't "put 4BTC in LN", either, so it's not a big deal for me to wait.
I'm monitoring my node using 1ml.com (any other ways?), and some channels have been closed. I guess people will eventually close them.
Could be that a newer version of c-lightning can recover the database (we added a couple of DB connection startup options that should ensure foreign key constraints don't break). If that's the case I'd suggest starting the node with the --offline
option, which will allow you to monitor the state of the channels, but the node will not accept incoming connections, or open outgoing connections, so it'll just make sure to extract the funds from the closes and track them until they are buried. At that point you can decommission the node after withdrawing all the funds on the node :-)
I had a similar catastrophic node failure involving a power outage and simultaneous RAID 1 failure.
I spent some time researching the process for recovering funds from a c-lightning node when all is lost but the hsm_secret
and found it fairly time-consuming to piece the information together, so I wrote a quick guide that makes use of lots of great resources and docs written by the many legends of lightning active on here. My hope is that it might save someone a bit of time/research in the future (and also act as a reference for myself in case something similar happens in the future and I forget something). Hopefully it may be of use to some people that land here.
I have found that in general there is quite a lot less information/resources/discussion (besides the Lightning Docs) on subjects like this for c-lightning in comparison to LND.
It is primarily intended as a pragmatic guide for in practice recovery, as opposed to a complete theoretical/technical covering of the recovery process.
Awesome, thanks @mandelbit for taking the time to write this up. You're right that there are fewer resources than LND, but then again we also have a smaller user-base :-) It'd people like you going the extra mile that allows us to improve the situation. So just wanted to thank you for this ^^
Awesome, thanks @mandelbit for taking the time to write this up. You're right that there are fewer resources than LND, but then again we also have a smaller user-base :-) It'd people like you going the extra mile that allows us to improve the situation. So just wanted to thank you for this ^^
Not at all, the least I can do after benefiting from all of your work. Huge thanks to you guys for creating such a great piece of software. Very excited about harnessing this stack, and contributing wherever I can.
Hi, yesterday I suffered a power outage on my bitcoin/lightning node, and after that was resolved clightning could not start again. Was using
v0.7.2
on a RasPi3. As such, reproducing may be hard...Tried rebooting, restarting clightning separately, uninstall/re-install clightning, build
v0.7.1
and downgrade, and didn't work. Something is preventing the node from starting, so my config tries to restart it indefinetly.The debug info (
log-level=info
) is pretty obscure to me:This is all I get, over and over. Or better, what I used to get. Eventually it stopped producing all three lines and is now only logging the last one,
We seem to be missing gossip messages
.hsm_secret
is in place, with timestamp when I first set the node up, about a year ago.lightningd.sqlite3
alsi in place, but timestamp the when I try running the node, so recent.Any ideas what could I do to restore the node? Should I try removing (with local backup) the two files above and restart? Not too sure what is crucial and what isn't to avoid losing funds, so didn't touch any of that so far. If that doesn't work, how could I just recover the funds? (Though problem remains, I still want to run lightning.)