Closed rustyrussell closed 4 years ago
Bit of a lengthy writeup of my explorations, but I thought I'd share them in case anyone has an idea of what it might be.
I tampered with my pay
plugin to make sure I don't accidentally pay the invoice you specified above (it'll presplit, but stop after starting the first sub-payment which is insufficient to settle the invoice). This allowed my to test this a couple of times from my node (which happens to also be a direct peer of 03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f
which appears to be the node we are giving some wrong instructions.
I noticed that the issue appears mainly when we only have a 2-hop route, whereas any longer route seems to work fine. I'll need to verify this against some different destinations, but so far it's working like clockwork.
Testing the same route manually using getroute
and sendpay
however works as expected:
$ echo $ROUTE[
{
"id": "03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f",
"channel": "636733x2469x0",
"direction": 1,
"msatoshi": 300000,
"amount_msat": "300000msat",
"delay": 153,
"style": "tlv"
},
{
"id": "03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898",
"channel": "576389x1922x0",
"direction": 0,
"msatoshi": 300000,
"amount_msat": "300000msat",
"delay": 9,
"style": "tlv"
}
]
$ lcli sendpay "$ROUTE" e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898 && lcli waitsendpay e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898
{
"message": "Monitor status with listpays or waitsendpay",
"id": 272489,
"payment_hash": "e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898",
"destination": "03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898",
"msatoshi": 300000,
"amount_msat": "300000msat",
"msatoshi_sent": 300000,
"amount_sent_msat": "300000msat",
"created_at": 1595341535,
"status": "pending"
}
{
"code": 203,
"message": "failed: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS (reply from remote)",
"data": {
"id": 272489,
"payment_hash": "e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898",
"destination": "03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898",
"msatoshi": 300000,
"amount_msat": "300000msat",
"msatoshi_sent": 300000,
"amount_sent_msat": "300000msat",
"created_at": 1595341535,
"status": "pending",
"erring_index": 2,
"failcode": 16399,
"failcodename": "WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS",
"erring_node": "03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898",
"erring_channel": "576389x1922x0",
"erring_direction": 0,
"raw_message": "400f00000000000493e00009c4a0"
}
}
This uses the usual probe trick of using a payment_hash
that isn't known to the destination. Notice that it doesn't yield a CLTV error, so there must be something we're messing up. The following are the options I can think of at the moment:
getroute
requestcreateonion
sendonion
correctly (basically the only thing that might happen is that we botch the first_hop
since the rest is covered by HMACs)getroute
result to onion payloadsWriting out the getroute
result yields the following:
[
{
"id": "03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f",
"channel": "636733x2469x0",
"direction": 1,
"msatoshi": 299879,
"amount_msat": "299879msat",
"delay": 184,
"style": "tlv"
},
{
"id": "03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898",
"channel": "576389x1922x0",
"direction": 0,
"msatoshi": 299879,
"amount_msat": "299879msat",
"delay": 40,
"style": "tlv"
}
]
Interpreted by c-lightning as this (dumping the elements of p->route
:
route[0] = 03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f, cltv=184, scid=636733x2469x0/1
route[1] = 03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898, cltv=40, scid=576389x1922x0/0
And encoded as onion payloads:
onion_hop[0]: 2
field[2]=049367
field[4]=09c4c8
field[6]=08cb850007820000
serialized=0203049367040309c4c8060808cb850007820000
onion_hop[1]: 2
field[2]=049367
field[4]=09c4c8
field[8]=54732ca34d13d2696030b513277af284844fca08b376c6f545d0cc57b81903bc0f4240
serialized=0203049367040309c4c8082354732ca34d13d2696030b513277af284844fca08b376c6f545d0cc57b81903bc0f4240
The CLTV entries (field[4]
) match up since the destination checks the incoming CLTV for tampering.
This seems all correct, the CLTV fields from the penultimate hop are copied over to the final hop so it can verify, and the values appear to be encoded correctly.
pay
vs sendpay
Notice the following are new attempts, so the values will change slightly from above!
Let's try to compare the createonion call in sphinx.c
to see if there are
significant changes between going through pay
or sendpay
. The following is
the dump from pay
:
XXX createonionpacket 2 hops
XXX sphinx_hop_payload[0]=14 02 03 019a36 04 03 09c4c8 06 08 08cb850007820000, nodeid=03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f
XXX sphinx_hop_payload[1]=2f 02 03 019a36 04 03 09c4c8 08 23 54732ca34d13d2696030b513277af284844fca08b376c6f545d0cc57b81903bc0f4240, nodeid=03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898
And the following is through sendpay
:
XXX createonionpacket 2 hops
XXX sphinx_hop_payload[0]=14 02 03 0493e0 04 03 09c4ad 06 08 08cb850007820000, nodeid=03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f
XXX sphinx_hop_payload[1]=0a 02 03 0493e0 04 03 09c4ad, nodeid=03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898
Again all of this appears to be correct, and the values match up mostly: pay
fuzzes the CTLV values and the amount
values, but the important thing is that the second to last node and the last one match up.
Debugging the first_hop
, first from sendpay
:
first_hop: 03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f 636733x2469x0 300000msat 153
And then from pay
:
first_hop: 03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f 636733x2469x0 51102msat 184
Notice that the discrepancy in th final CLTV comes from the invoice which specified a final CLTV delta of at least 40, whereas the default without invoice is 9.
After delving into the wire messages resulting from calling pay
and a perfectly matching sendpay
call (same route, fixed session_key
for the onion, same blockheight, and adding some instrumentation) I found that the onions are identical except for a single bit:
The following is from pay
:
XXX getroute [{"id":"03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f","channel":"636733x2469x0","direction":1,"msatoshi":30000,"amount_msat":"30000msat","delay":184,"style":"tlv"},{"id":"03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898","channel":"576389x1922x0","direction":0,"msatoshi":30000,"amount_msat":"30000msat","delay":40,"style":"tlv"}]
XXX session_key 4141414141414141414141414141414141414141414141414141414141414141
XXX hop[0]: pubkey=03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f payload=1302027530040309c4e0060808cb850007820000
XXX hop[1]: pubkey=03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898 payload=2e02027530040309c4e0082354732ca34d13d2696030b513277af284844fca08b376c6f545d0cc57b81903bc0f4240
and this one is from sendpay
:
XXX session_key 4141414141414141414141414141414141414141414141414141414141414141
XXX hop[0]: pubkey=03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f payload=1302027530040309c4e1060808cb850007820000
XXX hop[1]: pubkey=03e1210c8d4b236a53191bb172701d76ec06dfa869a1afffcfd8f4e07d9129d898 payload=2e02027530040309c4e1082354732ca34d13d2696030b513277af284844fca08b376c6f545d0cc57b81903bc0f4240
Notice the difference in the last byte of the 04
TLV field (040309c4e0
vs 040309c4e1
). This is an off-by-one on the CLTVs in the onion payloads. This in turn comes from this line here:
If I add the same courtesy +1 to the absolute CLTV computed by the pay
plugin we get absolutely identical onions, which is what I was hoping to see.
This leads me to the conclusion that they were expecting a +1 on the final CLTV delta, and sendpay adds that to the base_expiry
whereas I forgot about that in the onion payload creation.
See https://github.com/ElementsProject/lightning/commit/f5b6120517a431398c75278555478b000da7d6c4 for the really trivial fix (if that really fixes it...)
The bigger problem is that the recipient is wrong: it should not send an incorrect cltv expiry error, because we are using its desired expiry, and secondly that +1 is only a courtesy to allow for a block being found in the meantime. So either ACINQ or ZeusLN.app are wrong, but how to tell?
I was unable to test that my solution works since the channel was closed just before I could test it. So if anybody finds an endpoint with the same behavior let me know :-)
Since I couldn't replicate this issue in a test (presumably because the erring node is not a c-lightning node), I did the next best thing and probed the network both with and without the +1 to the CLTV. The result makes me very confident that the issue is resolved by the change: after 4000+ probes with the +1 change not a single failed attempt due to an incorrect CLTV, whereas probing without the +1 produced dozens after only 100 probed nodes.
I think it's save to say we have solved by working around this, but we might want to investigate which implementation is causing this quirk.
pay plugin logs: