Open btweenthebars opened 1 year ago
There was a change in the ln spec a while ago, in which Eclair asked us not to force close automatically if we detect they're lagging behind (allowing for a last ditch attempt at maybe restoring a more up to date backup).
So I'm tempted to say that we're correct and that we should close only when asked further.
Found the LN spec issue here: https://github.com/lightning/bolts/issues/934 and the local tracking issue here https://github.com/ElementsProject/lightning/issues/4927
This suggests that lnd should be sending an error in reply to our warning so we know we should be force closing.
Yep. I've been contacted several times now (thrice, I think) by node operators who have restored a static channel backup and don't understand why my node is not force-closing when they connect to it and present an outdated channel_reestablish
. I've been pointing them to lightning/bolts#934 and telling them to configure their nodes to send an error
message to tell my node to force-close — or to file a bug report if LND provides them no way to do that.
Kudos for implementing the spec! Maybe LND will too one of these days. 😉
Hey @cdecker I checked out that link, but is isn't actually a concrete spec change, instead it's an issue outlining an alternative interpretation and amendment to the spec. When it was brought up in the spec meeting, I gave pause as I envisioned scenarios like the ones our users are running into today, wherein they attempt to recover their funds, but end up in a stuck situation (potentially losing to suspended/lost funds).
Rolling out the change proposed change also wouldn't be super easy as it would be new clients modifying their behavior from the PoV of old clients that rely on such behavior. If we want to consider amending the spec, then I think we should do so via a new feature bit, something like lax_channel_reestablish
or w/e. That would allow new nodes to properly gate their behavior, instead of modifying behavior w/o any sort of externally visible signal.
Ok, so I was wrong above: there was a spec change to stop closing as soon as you get a bad channel_reestablish
message, but peers should still continue to close when an error is received. Today it appears that because CLN will tear down the connection, the sent error
never gets to the processing logic, so the channel is never force closed.
We dug a bit into the changes in the past release and found this: https://github.com/ElementsProject/lightning/commit/b698a5a5ef4acb8d90aec0f6dccc52fb6c37afb2#diff-b1299b24697159a06e7b426c5086c36df8e48db15739a0e301d53a9a21149722R3005-R3006. From my reading, before CLN would send an error, then force close. With the above change, CLN will now send an a warning
, then close the connection. lnd
nodes have always sent an error
during the recovery flow when we know we can't continue at all, which worked before since other peers would then force close either off that bad chan reest message, or the error itself.
IIUC, the impact of the above change is that:
error
message that prompts the peer to closelnd
will reconnect, send the chan reest message, then the error again, which begins the loop all over again I think if behavior is changed to send the warning, but not disconnect (so the error
can be sent), then things would work out again.
Yep. I've been contacted several times now (thrice, I think) by node operators who have restored a static channel backup and don't understand why my node is not force-closing when they connect to it and present an outdated
channel_reestablish
. I've been pointing them to lightning/bolts#934 and telling them to configure their nodes to send anerror
message to tell my node to force-close — or to file a bug report if LND provides them no way to do that.Kudos for implementing the spec! Maybe LND will too one of these days. 😉
Hi, if you get in touch with LND peers having the issue. Can you have them try this: https://github.com/lightningnetwork/lnd/issues/7301#issuecomment-1416029023 ? LND dev wants to know if it work.
Hi, if you get in touch with LND peers having the issue. Can you have them try this: lightningnetwork/lnd#7301 (comment) ? LND dev wants to know if it work.
Uhh, okay, I will, but damn, that's way too complicated for 99% of users, who are not developers.
Uhh, okay, I will, but damn, that's way too complicated for 99% of users, who are not developers.
Given that the solution linked to above might be too complicated and a problem for users running Umbrel and other node packaging solutions where running a custom version of lnd
is not easy, I created a specific new command in chantools
to solve this. Tested locally against CLN v22.11.1
.
Steps (alternative to running the custom version of lnd
mentioned here ):
chantools
(v0.10.7
) (can be any machine, use the darwin
release for MacOS).chantools triggerforceclose --peer 039d9c531877266e190af2d51795b6104a1e91a188dad4b9ccf3ae4154075dd8e3@localhost:9747 --channel_point 8e3efb6f5537d964c1c02c0587f75fc6995888675083400d4b10c02f47ad3592:0
lnd
linked above on a separate machine just to trigger the force close.lnd
(e.g. showing up in the list of lncli pendingchannels
), then just wait until the force close transaction is confirmed and the channel should automatically update the status there.chantools sweepremoteclosed
command to sweep the funds.Indeed, it looks like we issue warning then we take our toys and go home.
Let's see if I can reproduce, then fix. We need to keep listening, in case there's an error...
Issue and Steps to Reproduce
I have described the incident here on LND repo
It looks to me that it's half our fault not force closing the channel when we know that the revocation numbers don't match. This is shown on the listpeers:
I was able to force close it manually
getinfo
output"version": "v22.11.1"