Open Lagrang3 opened 3 weeks ago
I think the problem is with the parallel channels. On l2
channeld
is only seeing one channel with his peer l1
.
I can see that lightningd
's peer structure has a list of channels (defined in lightningd/peer_control.h
), but it seems that channeld
's peer structure has only one channel (defined in channeld/channeld.c
). Keep looking...
UPD: There's one channeld instance for every channel. So maybe lightningd is sending the HTLC request to the wrong subdaemon?
As a matter of fact in the logs:
035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#2: Adding HTLC 0 amount=501000000msat cltv=119 gave CHANNEL_ERR_ADD_OK
...
035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#2: Adding HTLC 1 amount=500000000msat cltv=119 gave CHANNEL_ERR_CHANNEL_CAPACITY_EXCEEDED
both HTLC requests go to the same channeld-chan#2
, one should have gone to channeld-chan#2
and the other to channeld-chan#4
.
There is a channel selection in lightningd
before forwarding: https://github.com/ElementsProject/lightning/blob/9d88ce3b592ca42a01104758313b9b2806d40230/lightningd/peer_htlcs.c#L1215
Is that a good idea, why wouldn't we try to forward right on the channels we have been requested to?
Clearly in this case that function is not selecting the best channel.
Though non-strict forwarding is allowed: https://github.com/lightning/bolts/blob/master/04-onion-routing.md#non-strict-forwarding
As a follow up to issue #7563, I have tested
renepay
on the same topology that @daywalker90 proposed for testinggetroutes
. It turned out the payment failed withfailed to find a feasible flow
due to a sequence of HTLC failures on a remote channel that should have had enough liquidity.To reproduce the problem I tried the following test using only
sendpay
. The payment flow is:l1 -> l2 -> l3
, where there are more than one channels connectingl1->l2
andl2->l3
.When trying to send a 4 part payment with the following routes
l1->l2->l3
over a couple of channels with capacity 400k,l1->l2->l3
over a couple of channels with capacity 300k,l1->l2->l3
over a couple of channels with capacity 200k,l1->l2->l3
over a couple of channels with capacity 100k.One or more payment parts always fail at the
l2->l3
hop withCHANNEL_ERR_CHANNEL_CAPACITY_EXCEEDED
seen atl2
logs.Another simpler case that fails as well is:
l1->l2->l3
over a couple of channels with capacity 400k,l1->l2->l3
over a couple of channels with capacity 300k.If instead I try making a single part payment:
l1->l2->l3
over a couple of channels with capacity 400k. The attempt succeeds.