ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.85k stars 902 forks source link

Feature request: Graceful shutdown #4842

Open whitslack opened 3 years ago

whitslack commented 3 years ago

@ZmnSCPxj explained:

HTLCs being in-flight are a normal part of operation and that state can persist for days; we cannot delay daemon shutdown until there are no in-flight HTLCs when in-flight HTLCs can persist for days at a time.

It is certainly true that daemon shutdown cannot be deferred until all HTLCs are cleared, but a significant improvement in Lightning UX could be achieved by implementing a graceful shutdown that would block the addition of new HTLCs and wait for up to a specified timeout for all HTLCs to clear.

Feature Request

Implementing this feature would reduce the occurrence of slow payment attempts for users of the Lightning Network. Rebooting a C-Lightning server can take several minutes, during which time any users with in-flight HTLCs must wait. This is bad UX and is not helping Lightning adoption. We can't easily fix HTLCs that go out to lunch and never return, but we can avoid dropping fresh HTLCs on the floor while we go out to lunch.

whitslack commented 3 years ago

N.B.: @ajpwahqgbi has attempted to write a C-Lightning graceful shutdown plugin, but it doesn't work correctly, as it apparently blocks the daemon's event loop and prevents any HTLCs from clearing out at all, thus actually exacerbating the problem it's trying to remedy. Also, it lacks a timeout.

ZmnSCPxj commented 3 years ago

This may not be (perfectly) possible in terms of the BOLT protocol.

In BOLT#2, there is an implicit ordering requirement. Suppose you are doing (or want to do, ha) a graceful shutdown, and I am your peer. There is an HTLC you already offered to me. Now, suppose I send the following messages to you, in the specified order:

You can either respond:

However, note that the first message adds an HTLC, which is further from your goal of reducing the number of HTLCs in-flight before you finalize shutdown.

You cannot respond to just the second message and remove an HTLC without adding the first message HTLC.

The first option (respond to neither message) is equivalent to just shutting down with a SIGKILL immediately then and there instead of trying for a timed-out graceful shutdown. The third option is not a marked improvement: you remove one HTLC but gain another anyway.

I think that properly speaking we may need a spec change, where a node tells its peers "no more update_add_htlcs until I reconnect" but will still accept fulfilling and failing HTLCs to reduce in-flight HTLCs.

--

However, we can (maybe?) offer a partial solution.

We can have channeld accept a graceful_shutdown message, which puts it in graceful_shutdown mode. It should also probably get that flag in its init message too.

In this mode, channeld will handle update_fulfill_htlc and update_fail_htlc as normal. However, if it receives update_add_htlc it will immediately exit, which automatically closes the peer connection and appears to the peer as a sudden shutdown of our node.

This at least lets us have some chance to reduce the number of HTLCs without gaining more HTLCs, without having to modify the spec (which would take longer).

whitslack commented 3 years ago

@ZmnSCPxj: In your example, why not reply to the update_add_htlc with a temporary failure 0x2002 (temporary_node_failure) and process the update_fulfill_htlc normally? We should endeavor to keep the channeld running if there are any HTLCs in flight. The channeld processes for channels with no HTLCs in flight can be terminated immediately (i.e., sooner than lightningd times out and terminates).

ZmnSCPxj commented 3 years ago

You cannot reply to update_add_htlc with a routing failure --- that is done with a later update_fail_htlc. Unfortunately, that means accepting the update_add_htlc now, then after the HTLC has been irrevocably committed (and therefore we have exchanged signatures and added the new HTLC as in-flight), we will then update_fail_htlc with the temporary_node_failure. The counterparty might not reply immediately (we have no control over their code) and our graceful shutdown timer might time out (or your systemd gets bored and decides to SIGKILL you) before they reply, meaning we "gracefully" shut down with an additional HTLC than when the grace period started.

So no, your only recourse is to drop the connection so that neither you nor the counterparty ever instantiate the new incoming HTLC. There is still the chance they reconnect within the grace period and then decide not to update_add_htlc but still do the update_fulfill_htlc of the existing HTLC, in that case, depending on how it is coded (maybe?).

(In fact I believe we do this now, if we have a just-arrived HTLC that is to be forwarded, but before we actually have sent out the update_add_htlc to the next hop, and we get shutdown, we will temporary_node_failure the HTLC; though @rustyrussell probably knows the HTLC state machine much better than I do.)

whitslack commented 3 years ago

Damn. Who the hell designed this protocol? It has so many novice mistakes. Anyway, thanks for your explanations.

ZmnSCPxj commented 3 years ago

N00b LN devs 4->5 years ago (all of us were n00bs, LOL).

Mostly the concern then was to ensure that we could always have a fallback safely on-disk before doing potentially funds-losing operations, which is why the protocol was designed that way; graceful shutdowns were not a consideration, but sudden unexpected I-tripped-over-the-power-cord-sorrry shutdowns were, and the protocol was designed so that sudden disconnections are perfectly fine (which is why I suggest just disconnecting, the protocol is designed to survive that and that code path is well-tested in all implementations). It is much easier to implement and test your implementation if you always process messages in the order you receive them rather than supporting a "I can respond to that message but ignore this other message", especially if you need to save stuff on disk and then re-establish later with the peer from your saved on-disk stuff. Graceful shutdowns were not a concern, ungraceful kick-the-cord shutdowns were.

whitslack commented 3 years ago

For what it's worth, I don't mean that responding to protocol messages in order is a mistake. I only meant that having no mechanism for rejecting protocol messages (other than disconnecting) seems like an oversight. Similarly with the whole thing where you can't send an "error" to your channel peer without triggering them to force-close your channel.

ZmnSCPxj commented 3 years ago

Well, I suppose I now have to detail it exactly, and the rationale. You can try reading through BOLT#2 too.

Basically, I lied when I said your options are to respond to none of the updates, one of them, or both. In reality, multiple update_* messages are effectively batched into a single Poon-Dryja update (i.e. signature followed by revocation). And it is only the entire batch you can ignore or reject, because it is the signing and revocation which are "the real" events that commit everyone to a new state.

So your only real options, if the counterparty gave you update_add_htlc followed by update_fail_htlc, are:

You ignore the batch by ignoring their commitment_signed and not giving back a revoke_and_ack, but in practice most implementations will stall waiting on your revoke_and_ack and that is no better than just disconnecting to reject the entire batch (and disconnecting is just plain cleaner and is less likely to trigger edge-case bugs).

The reason to batch is that some of the low-level networking protocols are much more amenable to having as few turnarounds as possible. Meaning you would prefer to have one side do multiple message sends, then the other side. For example it might use a shared media (Ethernet, WiFi), or the protocol might have high latency but high bandwidth (Tor, long-distance cable). For my example it would look like:

You                         Me
 |                          |
 |<-----update_add_htlc-----|
 |<---update_fulfill_htlc---|
 |<----commitment_signed----|
 |                          |
 |------revoke_and_ack----->|
 |-----commitment_signed--->|
 |                          |
 |<-----revoke_and_ack------|
 |                          |

Now, what you want to do would be something like this:

You                         Me
 |                          |
 |<-----update_add_htlc-----|
 |                          |
 |-------accept_update----->|
 |                          |
 |<---update_fulfill_htlc---|
 |                          |
 |-------accept_update----->| 
 |                          |
 |<----commitment_signed----|
 |                          |
 |------revoke_and_ack----->|
 |-----commitment_signed--->|
 |                          |
 |<-----revoke_and_ack------|
 |                          |

Because of the need to turn around and accept each individual update_ message, the latency is worsened; there are now multiple turnarounds, which is bad for shared-media network transports and for transports with high latency. We could do some amount of batching negotiation, i.e. there could be a "okay that is the batch of changes I want to add" which should be replied with "hmm I only accept these particular changes" before signing and revoking, but that still adds one more turnaround.

And a good part of the slowness in forwarding is due precisely to these turnarounds or roundtrips. Already with the current design (which is already optimized as well as we can figure out, barring stuff like my weird Fast Forwards idea that you can read about on the ML) you get the 1.5 roundtrips, and that is repeated at each hop in a payment route. Even with a "I only accept these particular changes in that batch" you still add 1 more roundtrip, meaning that is 2.5 roundtrips per hop, on the forwarding of a payment.

It gets worse when a failure happens at a later stage. Failures need to wait for the outgoing HTLC to be removed before we can consider asking the incoming HTLC to be removed, so the 1.5 roundtrips is again repeated at each hop going back. If we allow for filtering which messages we accept that rises to 2.5 roundtrips.

And forwarding payments and returning failures happen a lot more often than controlled shutdown of forwarding nodes does (because forwarding nodes want to be online as much as possible), so it makes sense to optimize the protocol for that, at the expense of controlled shutdown. The 1.5 roundtrips are the minimum safe possible if we want to remain robust against uncontrolled shutdown, too.

So it seems to me that you want some kind of protocol message to say "okay I am in graceful shutdown mode and will reject batches of updates that increase HTLCs". The counterparty might already have sent out some updates taht increase the number of HTLCs before it gets that message, because latency, so we need a code path to handle that somehow. Then the counterparty has to ensure that it does not make a batch of updates that increases the number of HTLCs --- and the easiest way to implement that, with fewer edge cases for bugs in implementation, is to simply stop all updates, which is not much different from, well, disconnecting.

whitslack commented 3 years ago

What I'm suggesting is this:

You                         Me
 |                          |
 |<-----update_add_htlc-----|
 |<---update_fulfill_htlc---|
 |<----commitment_signed----|
 |                          |
 |------refuse_update------>|
 |------accept_update------>|
 |----commitment_signed---->|
 |                          |
 |<-----revoke_and_ack------|
 |<----commitment_signed----|
 |                          |
 |------revoke_and_ack----->|
 |                          |

Yes, it's an additional one-way trip but only in this exceptional case. The common case would still be 1.5 round trips.

ZmnSCPxj commented 3 years ago

Well you would probably not want to do commitment_signed if you refuse_update in response to a commitment_signed. In general we do not perform updates from the counterparty on the counterparty commitment unless they have been applied on our own commitment first (which is what the commitment_signed from me gives, but you are rejecting it).

Also note that since I already gave the commitment_signed before you send refuse_update, your commitment with all my proposed updates is potentially valid and can be confirmed onchain, and your assertion that you are "refusing" it is only an assertion by you --- you could be lying to me, claiming to refuse, but in reality fully intending to use my signature after I accept your refusal and have deleted db information on my side, which might screw me over later on. Thus, the proposed protocol may hide CVE-level bugs if not implemented correctly, and probably needs a lot more thinking about to ensure that does not happen.

So your refuse_update needs to also include a revocation of the commitment I signed, and since commitments have a monotonic counter (and implementations might rely on that counter for other uses), this may be problematic, since the next commitment might no longer be n+1 where n is the current commitment being held.

In the case of disconnection, we have a channel_reestablish protocol, where we re-exchange the last signatures we sent, so it is safer to disconnect than to try to build yet another commitment at n+2 (or n+3, if you refuse_update yet again) while n is valid but n+1 has been revoked.

In short, your proposal hides a fair amount of complexity in the update state machine, I would suggest splitting this issue into two parts:

whitslack commented 3 years ago

Thank you for all the explanation. I was aware that you'd be committing to HTLCs that I'd subsequently be refusing, but my expectation is that I'd revoke that commitment after you send me a new one that includes only the HTLC that I accepted. That does imply that I'd be revoking two commitments at once, and you would have to assume that either of them could be used against you until I give you that revocation.

while n is valid but n+1 has been revoked.

Is that even possible? I thought the way SHA chains work is that a revocation of a particular state gives you the ability to derive revocations of all preceding states.

Anyway, yes, splitting this into two requests is clearly the way forward. The first half seems trivial to implement correctly, and it would go a long way toward reducing the adverse effects of taking down a popular routing node for maintenance.

ZmnSCPxj commented 3 years ago

while n is valid but n+1 has been revoked.

Is that even possible? I thought the way SHA chains work is that a revocation of a particular state gives you the ability to derive revocations of all preceding states.

Right right, that is a brain fart on my end, sorry.

Your proposal seems a workable one, but probably needs more eyes and brains on it. I would personally prefer a i_am_going_to_graceful_shutdown_soon and an acknowledging go_ahead_and_die (maybe not needed?), with a sort of "soft" request that the peer avoid update_add_htlc (i.e. the peer can respond with go_ahead_and_die but still keep adding update_add_htlc and we would just shrug and say "oh well", maybe some UNUSUAL-level log, but a proper implementation would avoid sending update_add_htlc), as that probably causes fewer problems / changes to existing code, especially considering multiple implementations.

As a soft request i_am_going_to_graceful_shutdown_soon might even be implemented as an odd message without needing feature bit negotiation, we would just optimistically send it to all channeled peers and hope they get the message.

whitslack commented 3 years ago

@ZmnSCPxj: Would the proposed Quiescence Protocol be of use here?

Hmm, maybe not, since that protocol, as proposed at least, prevents removing HTLCs as well as adding them. But maybe the proposal could be amended so "add" updates and "remove" updates are independently toggleable?

ZmnSCPxj commented 3 years ago

Hmm, maybe not, since that protocol, as proposed at least, prevents removing HTLCs as well as adding them. But maybe the proposal could be amended so "add" updates and "remove" updates are independently toggleable?

Correct, not directly useful.

The quiescence protocol currently does not specify how the quiescence period ends, and it is intended for changing the commitment protocol (i.e. switching from the current Poon-Dryja to a different variant, or adding an offchain "conversion transaction" to switch to e.g. Decker-Russell-Osuntokun, or maybe just allowing implementations on both sides to mutate their database for an "allow PTLCs" flag safely with as little chance of unexpected stuff happening). But maybe bits of its implementation can be used.