ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.81k stars 887 forks source link

Unable to pay extreme multipart payment in tests #7605

Open Lagrang3 opened 3 weeks ago

Lagrang3 commented 3 weeks ago

As a follow up to issue #7563, I have tested renepay on the same topology that @daywalker90 proposed for testing getroutes. It turned out the payment failed with failed to find a feasible flow due to a sequence of HTLC failures on a remote channel that should have had enough liquidity.

To reproduce the problem I tried the following test using only sendpay. The payment flow is: l1 -> l2 -> l3, where there are more than one channels connecting l1->l2 and l2->l3.

When trying to send a 4 part payment with the following routes

One or more payment parts always fail at the l2->l3 hop with CHANNEL_ERR_CHANNEL_CAPACITY_EXCEEDED seen at l2 logs.

Another simpler case that fails as well is:

If instead I try making a single part payment:

def get_local_channel_by_id(node, chanid):
    peerchannels = node.rpc.listpeerchannels()['channels']
    if not peerchannels:
        return None
    for c in peerchannels:
        if c['channel_id']==chanid:
            return c
    return None

def start_channels(connections):
    nodes = list()
    for src, dst, fundamount in connections:
        nodes.append(src)
        nodes.append(dst)
        src.rpc.connect(dst.info["id"], "localhost", dst.port)

    bitcoind = nodes[0].bitcoin
    # If we got here, we want to fund channels
    for src, dst, fundamount in connections:
        addr = src.rpc.newaddr()["bech32"]
        bitcoind.rpc.sendtoaddress(addr, (fundamount + 1000000) / 10**8)

    bitcoind.generate_block(1)
    sync_blockheight(bitcoind, nodes)
    txids = []
    chan_ids = []
    for src, dst, fundamount in connections:
        reply = src.rpc.fundchannel(dst.info["id"], fundamount, announce=True, minconf=0)
        txids.append(reply["txid"])
        chan_ids.append(reply["channel_id"])

    # Confirm all channels and wait for them to become usable
    bitcoind.generate_block(1, wait_for_mempool=txids)
    scids = []
    for con, mychan_id in zip(connections, chan_ids):
        src = con[0]
        wait_for(lambda: get_local_channel_by_id(src, mychan_id)['state'] == "CHANNELD_NORMAL")
        scids.append(get_local_channel_by_id(src, mychan_id)['short_channel_id'])

    # Make sure they have all seen block so they don't complain about
    # the coming gossip messages
    sync_blockheight(bitcoind, nodes)
    bitcoind.generate_block(5)

    # Make sure everyone sees all channels, all other nodes
    for n in nodes:
        for scid in scids:
            n.wait_channel_active(scid)

    # Make sure we have all node announcements, too
    for n in nodes:
        for n2 in nodes:
            wait_for(
                lambda: "alias" in only_one(n.rpc.listnodes(n2.info["id"])["nodes"])
            )
    return chan_ids

def test_channel_reserve(node_factory):
    def direction(node1, node2):
        return 0 if node1.info['id']<node2.info['id'] else 1

    opts = {"disable-mpp": None, "fee-base": 0, "fee-per-satoshi": 0, "cltv-delta": 6}
    l1, l2, l3 = node_factory.get_nodes(3, opts=opts)

    chan_ids = start_channels(
        [
            (l1, l2, 400_000),
            (l2, l3, 400_000),
            (l1, l2, 300_000),
            (l2, l3, 300_000),
            (l1, l2, 200_000),
            (l2, l3, 200_000),
            (l1, l2, 100_000),
            (l2, l3, 100_000),
        ]
    )

    # we should be able to send these four parts:
    # nparts = 4
    # route_amounts = ["350000sat", "250000sat", "150000sat", "50000sat"]
    # instead the test fails even with these:
    nparts = 2
    route_amounts = ["201000sat", "200000sat", "150000sat", "50000sat"]
    total_msat = sum( [ Millisatoshi(a) for a in route_amounts[:nparts] ])

    # Test succeeds if we are able to pay this invoice
    inv = l3.rpc.call("invoice", {"amount_msat": total_msat,
        "label":"inv","description":"inv","cltv":10})
    inv_decode = l1.rpc.decode(inv["bolt11"])

    # Shared data by every route we will construct: l1->l2->l3
    route = [{"id": l2.info["id"],
        "direction": direction(l1,l2),
        "delay": 16,
        "style": "tlv"},
        {"id": l3.info["id"],
        "direction": direction(l2,l3),
        "delay": 10,
        "style": "tlv"}]

    # Send every part with sendpay
    for part in range(nparts):
        this_part_msat = Millisatoshi(route_amounts[part])
        chan1 = get_local_channel_by_id(l1, chan_ids[part*2])
        chan2 = get_local_channel_by_id(l2, chan_ids[part*2+1])

        route[0]["channel"] = chan1["short_channel_id"]
        route[1]["channel"] = chan2["short_channel_id"]
        route[0]["amount_msat"] = route[1]["amount_msat"] = this_part_msat

        assert chan1["spendable_msat"]>=this_part_msat
        assert chan2["spendable_msat"]>=this_part_msat

        l1.rpc.call(
            "sendpay",
            {"route": route,
             "payment_hash": inv["payment_hash"],
             "payment_secret": inv["payment_secret"],
             "amount_msat": total_msat,
             "groupid": 1,
             "partid": part+1,},
        )
    l1.wait_for_htlcs()

    # Are we happy?
    waitsendpay_response = l1.rpc.call("waitsendpay",{"payment_hash":
        inv["payment_hash"], "partid": 1, "groupid": 1})
    receipt = only_one(l3.rpc.listinvoices("inv")["invoices"])
    assert receipt["status"] == "paid"
    assert receipt["amount_received_msat"] == total_msat
Lagrang3 commented 3 weeks ago

I think the problem is with the parallel channels. On l2 channeld is only seeing one channel with his peer l1.

I can see that lightningd's peer structure has a list of channels (defined in lightningd/peer_control.h), but it seems that channeld's peer structure has only one channel (defined in channeld/channeld.c). Keep looking...

UPD: There's one channeld instance for every channel. So maybe lightningd is sending the HTLC request to the wrong subdaemon?

As a matter of fact in the logs:

035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#2: Adding HTLC 0 amount=501000000msat cltv=119 gave CHANNEL_ERR_ADD_OK
...
035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#2: Adding HTLC 1 amount=500000000msat cltv=119 gave CHANNEL_ERR_CHANNEL_CAPACITY_EXCEEDED

both HTLC requests go to the same channeld-chan#2, one should have gone to channeld-chan#2 and the other to channeld-chan#4.

Lagrang3 commented 3 weeks ago

There is a channel selection in lightningd before forwarding: https://github.com/ElementsProject/lightning/blob/9d88ce3b592ca42a01104758313b9b2806d40230/lightningd/peer_htlcs.c#L1215

Is that a good idea, why wouldn't we try to forward right on the channels we have been requested to?

Clearly in this case that function is not selecting the best channel.

Lagrang3 commented 3 weeks ago

Though non-strict forwarding is allowed: https://github.com/lightning/bolts/blob/master/04-onion-routing.md#non-strict-forwarding