Closed NicolasDorier closed 1 year ago
Analyzing the situation, it seems the HLTC fail because we are mining a bunch of block during a payment.
So my questions are:
pay
shouldn't be pending.We can't terminate the pay
call, as that is meant to be synchronous, and despite the channel going down, the HTLC may still succeed (completing the HTLC success path on-chain) or fail (completing on-chain via the timeout), hence pay
cannot decide whether to report a success or a failure. Notice that in an MPP payment, pay
would still report success or failure as long as one non-stuck HTLC reports an MPP_TIMEOUT
(failure) or returns the preimage
(success).
This is a common issue for payments, where an on-chain resolution can take considerable time, and we can't determine whether it is going to succeed or fail, so we can't report it back to the user.
@cdecker I agree maybe for the current payment being sent. But should it be the case for future payment?
Say I do payment first (success), second (pending, channel transition to unilateral close), the third payment shouldn't try to use the channel that is transitioning to unilateral close and thus be stuck with the second payment.
It's like if closing channel were considered for routing future payment?
Also about the channel dropping, I tried to change watchtime-blocks=100
expecting it would make the HTLC deadline 100 blocks. But didn't seem to work actually. Is there another solution?
That detail is giving me some trouble: if the channel transitions away from CHANNELD_NORMAL
into any other state, we will not consider it for a payment that is started after that transition happened. Even if we were there is no way for us to add an HTLC to a channel that isn't in CHANNELD_NORMAL
, so it can't happen that the HTLC gets stuck on an already closing channel (not least because lightningd
will simply report that the first hop is unavailable). So I don't see how we could get stuck on a closing channel, after that channel has started closing.
Do you have any logs showing how pay
after the state transition causes the HTLC to be added to that channel and then getting stuck?
@cdecker I am silly, I think the issue is that I was just calling pay
a second time with the same BOLT11 that was already pending.
It seems to me c-lightning is the only implementation doing the right thing, and others should have crashed before as they should have been in the same condition.
I am closing this one:
pay
is indeed normal as even if the channel is AWAITING UNILATERAL, the HLTC could still succeed. My second call to pay
was on the same BOLT11 that was pending, so it is normal it also blocks.Good catch on identifying that another call to pay
after the channel initiated a close would share the same fate of the pay
before the close.
As for why blocks are being found during the payment: bcli
polls the bitcoind
process at regular intervals (30s without --dev-bitcoind-poll
) so if you generate, and then send pay
before lightningd
has a chance to poll it'll fall behind and use its current blockheight. This is fine usually because we tend to generate a couple of blocks at a time, and we can wait for sync if needed, but if you generate dozens or hundreds of blocks at a time it can happen that the timeouts get triggered. I'd suggest waiting for sync if you want to stabilize tests. Notice also that 30s << 10m so in normal circumstances this cannot happen on mainnet.
@cdecker I use dev-bitcoind-poll=1
and normally my tests are waiting that clightning get synched with my new blocks before doing anything else with getinfo
. I am digging into what happen right now.
I believe this issue was related to the fact I may have missed a corner case where I wasn't waiting for the sync. The fact c-lightning is polling may have mattered on the timing. Sorry for that! :)
EDIT: After posting this, issue, I understood why channel get closed. I think it is understandable behavior and not a bug (we were mining block during a HTLC) What is not normal, is
pay
blocking rather than failing fast.Version 22.11.1 dev.
On our regtest environment, sometimes lightningd is VERY flaky.
I have only two nodes, Alice and Bob and Alice wants to send money to Bob.
The channel is constructed, then after a while suddenly the channel get stuck in a
AWAITING_UNILATERAL
state. Worse: When we attempt to pay an invoice after this happen, the call topay
is blocking rather than returning an error code telling us there is no route.Analyzing the logs I see two issue:
pay
get stuck inpending
rather thanfailed
.Alice:
Bob:
Config Alice
Config Bob
Stuck payment
Alice when the payment seems to get stuck
Bob