ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.83k stars 897 forks source link

pay plugin fails to route around depleted low fee channels #7177

Closed JssDWt closed 6 months ago

JssDWt commented 7 months ago

Issue and Steps to Reproduce

I suspect the issue is a race somewhere in the pay plugin, which causes retry attempts to not find a route. Here's a test of a specific scenario we encountered recently. This test sometimes fails, sometimes succeeds. This is a setup similar of the architecture at Breez.

The sender connected to a router, the router is connected to the destination. This is the path we'd like to find. The router, however, is also connected to a random other node, which is connected to the destination. This path has low fees, so will be initially preferred. The payment will fail to find the second route.

 sender - router --------- dest
             \              /
              - randomnode -
@pytest.mark.slow_test
def test_pay_avoid_low_fee_node(node_factory, bitcoind, executor, chainparams):
    """Make sure we're able to succeed a payment with retries if a low fee channel is depleted """

    # Setup:
    # sender - router --------- dest
    #             \              /
    #              - randomnode -
    # router is connected to the destination
    # randomnode is also connected to router and the destination, with a low fee path. The channel however, is depleted.
    sender, router, randomnode, dest = node_factory.get_nodes(4, opts={'feerates': (7500, 7500, 7500, 7500)})
    sender.rpc.connect(router.info['id'], 'localhost', router.port)
    sender.fundchannel(router, 200000, wait_for_active=True)
    router.rpc.connect(dest.info['id'], 'localhost', dest.port)
    router_dest_scid, _ = router.fundchannel(dest, 10**6, wait_for_active=True)
    randomnode.rpc.connect(dest.info['id'], 'localhost', dest.port)
    randomnode_dest_scid, _ = randomnode.fundchannel(dest, 10**6, wait_for_active=True)

    # router has a depleted channel to randomnode
    # mimic this by opening the channel the other way around
    randomnode.rpc.connect(router.info['id'], 'localhost', router.port)
    scid_router_random, _ = randomnode.fundchannel(router, 10**6, wait_for_active=True)

    # Set relevant fees
    # High fee from router to dest
    # Low fee from router to randomnode and randomnode to dest
    router.rpc.setchannel(router_dest_scid, feebase=0, feeppm=2000, htlcmin=1)
    router.rpc.setchannel(scid_router_random, feebase=0, feeppm=1, htlcmin=1)
    randomnode.rpc.setchannel(randomnode_dest_scid, feebase=0, feeppm=1, htlcmin=1)

    def has_gossip():
        channels = sender.rpc.listchannels()['channels']
        if sum(1 for c in channels if c['fee_per_millionth'] == 1) != 2:
            return False

        if sum(1 for c in channels if c['fee_per_millionth'] == 2000) != 1:
            return False

        return True

    mine_funding_to_announce(bitcoind, [sender, router, randomnode, dest])
    wait_for(has_gossip)

    def listpays_nofail(b11):
        while True:
            pays = sender.rpc.listpays(b11)['pays']
            if len(pays) != 0:
                if only_one(pays)['status'] == 'complete':
                    return
                assert only_one(pays)['status'] != 'failed'

    inv = dest.rpc.invoice(100000000, 'test_pay_avoid_low_fee_node', 'test_pay_avoid_low_fee_node')

    # Make sure listpays doesn't transiently show failure while pay
    # is retrying.
    fut = executor.submit(listpays_nofail, inv['bolt11'])

    # Pay sender->dest should succeed via non-depleted channel
    sender.dev_pay(inv['bolt11'], dev_use_shadow=False)

    # But it doesn't
    fut.result()

log output

See here that after the first failed attempt, the second part immediately fails to find a route. The payment is then split and split, but the second try really should have been able to find a route over one of the routing nodes

lightningd-1 2024-03-28T07:52:46.818Z DEBUG   plugin-pay: cmd 34 partid 0: Intermediate node 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d reported 1007 (WIRE_TEMPORARY_CHANNEL_FAILURE) at 109x1x0 on route 103x1x0 -> 105x1x0 -> 109x1x0 -> 111x1x0
lightningd-1 2024-03-28T07:52:46.818Z DEBUG   plugin-pay: cmd 34 partid 0: Added a channel hint for 109x1x0/0: enabled true, estimated capacity 100000099msat
lightningd-1 2024-03-28T07:52:46.818Z INFO    plugin-pay: cmd 34 partid 0: failed: WIRE_TEMPORARY_CHANNEL_FAILURE (reply from remote)
lightningd-1 2024-03-28T07:52:46.818Z DEBUG   plugin-pay: cmd 34 partid 0: Retrying 65c0e1ddfc8e5798b18ccc3eb50b09aae84da1038f4f375dbc95f1984028a80f/0 (100000000msat), new partid 1. 9 attempts left\n
lightningd-1 2024-03-28T07:52:46.819Z DEBUG   plugin-pay: cmd 34 partid 1: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.819Z INFO    plugin-pay: cmd 34 partid 1: No path found
lightningd-1 2024-03-28T07:52:46.819Z DEBUG   plugin-pay: cmd 34 partid 1: Retrying 65c0e1ddfc8e5798b18ccc3eb50b09aae84da1038f4f375dbc95f1984028a80f/1 (100000000msat), new partid 2. 8 attempts left\n
lightningd-1 2024-03-28T07:52:46.819Z DEBUG   plugin-pay: cmd 34 partid 2: Not using a routehint
lightningd-1 2024-03-28T07:52:46.820Z INFO    plugin-pay: cmd 34 partid 2: No path found
lightningd-1 2024-03-28T07:52:46.820Z DEBUG   plugin-pay: cmd 34 partid 2: Adaptively split into 2 sub-payments: new partid 3 (45643354msat), new partid 4 (54356646msat)
lightningd-1 2024-03-28T07:52:46.820Z DEBUG   plugin-pay: cmd 34 partid 3: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.821Z DEBUG   plugin-pay: cmd 34 partid 3: Created outgoing onion for route: 103x1x0 -> 105x1x0 -> 109x1x0 -> 111x1x0
lightningd-1 2024-03-28T07:52:46.821Z DEBUG   plugin-pay: cmd 34 partid 4: Not using a routehint
lightningd-1 2024-03-28T07:52:46.821Z INFO    plugin-pay: cmd 34 partid 4: No path found
lightningd-1 2024-03-28T07:52:46.821Z DEBUG   plugin-pay: cmd 34 partid 4: Retrying 65c0e1ddfc8e5798b18ccc3eb50b09aae84da1038f4f375dbc95f1984028a80f/4 (54356646msat), new partid 5. 9 attempts left\n
lightningd-1 2024-03-28T07:52:46.821Z DEBUG   lightningd: Payment: 100000000msat FAILED
lightningd-1 2024-03-28T07:52:46.821Z DEBUG   lightningd: Selected channel 103x1x0 (200000000msat) for selector 103x1x0 (45643354msat)
lightningd-1 2024-03-28T07:52:46.821Z INFO    lightningd: Sending 45643901msat in onion to deliver 45643354msat
lightningd-1 2024-03-28T07:52:46.822Z DEBUG   plugin-pay: cmd 34 partid 5: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.822Z INFO    plugin-pay: cmd 34 partid 5: No path found
lightningd-1 2024-03-28T07:52:46.822Z DEBUG   plugin-pay: cmd 34 partid 5: Adaptively split into 2 sub-payments: new partid 6 (26870176msat), new partid 7 (27486470msat)
lightningd-1 2024-03-28T07:52:46.822Z DEBUG   lightningd: Payment part 3/3/1 status 0
lightningd-2 2024-03-28T07:52:46.822Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: peer_in WIRE_UPDATE_ADD_HTLC
lightningd-1 2024-03-28T07:52:46.822Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 1 = SENT_ADD_HTLC/RCVD_ADD_HTLC 
lightningd-2 2024-03-28T07:52:46.822Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: NEW:: HTLC REMOTE 1 = RCVD_ADD_HTLC/SENT_ADD_HTLC 
lightningd-1 2024-03-28T07:52:46.822Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 1 amount=45643901msat cltv=140 gave CHANNEL_ERR_ADD_OK
lightningd-1 2024-03-28T07:52:46.822Z DEBUG   plugin-pay: cmd 34 partid 6: Not using a routehint
lightningd-1 2024-03-28T07:52:46.822Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC
lightningd-1 2024-03-28T07:52:46.822Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds
lightningd-1 2024-03-28T07:52:46.823Z DEBUG   plugin-pay: cmd 34 partid 6: Created outgoing onion for route: 103x1x0 -> 105x1x0 -> 109x1x0 -> 111x1x0
lightningd-1 2024-03-28T07:52:46.823Z DEBUG   plugin-pay: cmd 34 partid 7: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.823Z INFO    plugin-pay: cmd 34 partid 7: No path found
lightningd-1 2024-03-28T07:52:46.823Z DEBUG   plugin-pay: cmd 34 partid 7: Retrying 65c0e1ddfc8e5798b18ccc3eb50b09aae84da1038f4f375dbc95f1984028a80f/7 (27486470msat), new partid 8. 9 attempts left\n
lightningd-1 2024-03-28T07:52:46.824Z DEBUG   lightningd: Payment: 100000000msat FAILED
lightningd-1 2024-03-28T07:52:46.824Z DEBUG   lightningd: Payment: 45643354msat PENDING
lightningd-1 2024-03-28T07:52:46.824Z DEBUG   lightningd: Selected channel 103x1x0 (200000000msat) for selector 103x1x0 (26870176msat)
lightningd-1 2024-03-28T07:52:46.824Z INFO    lightningd: Sending 26870497msat in onion to deliver 26870176msat
lightningd-1 2024-03-28T07:52:46.824Z DEBUG   plugin-pay: cmd 34 partid 8: Not using a routehint
lightningd-1 2024-03-28T07:52:46.824Z INFO    plugin-pay: cmd 34 partid 8: No path found
lightningd-1 2024-03-28T07:52:46.824Z DEBUG   plugin-pay: cmd 34 partid 8: Adaptively split into 2 sub-payments: new partid 9 (12529144msat), new partid 10 (14957326msat)
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Trying commit
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   lightningd: Payment part 6/6/1 status 0
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: htlc 1: SENT_ADD_HTLC->SENT_ADD_COMMIT
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: htlc added REMOTE: local 200000000 remote 0
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   plugin-pay: cmd 34 partid 9: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-hsmd: Got WIRE_HSMD_SIGN_REMOTE_COMMITMENT_TX
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: -> local 154356099 remote 0
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   hsmd: Client: Received message 19 from client
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: sending_commit: HTLC LOCAL 1 = SENT_ADD_COMMIT/RCVD_ADD_COMMIT 
lightningd-1 2024-03-28T07:52:46.825Z INFO    plugin-pay: cmd 34 partid 9: No path found
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   plugin-pay: cmd 34 partid 9: Retrying 65c0e1ddfc8e5798b18ccc3eb50b09aae84da1038f4f375dbc95f1984028a80f/9 (12529144msat), new partid 11. 9 attempts left\n
lightningd-1 2024-03-28T07:52:46.825Z DEBUG   plugin-pay: cmd 34 partid 10: Not using a routehint
lightningd-1 2024-03-28T07:52:46.825Z INFO    plugin-pay: cmd 34 partid 10: No path found

.. etc

lightningd-1 2024-03-28T07:52:46.862Z INFO    plugin-pay: cmd 34 partid 108: No path found
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   plugin-pay: cmd 34 partid 109: Not using a routehint
lightningd-1 2024-03-28T07:52:46.862Z INFO    plugin-pay: cmd 34 partid 109: No path found
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   plugin-pay: cmd 34 partid 110: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.862Z INFO    plugin-pay: cmd 34 partid 110: No path found
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   plugin-pay: cmd 34 partid 111: Not using a routehint
lightningd-1 2024-03-28T07:52:46.862Z INFO    plugin-pay: cmd 34 partid 111: No path found
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   plugin-pay: cmd 34 partid 112: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.862Z INFO    plugin-pay: cmd 34 partid 112: No path found
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   plugin-pay: cmd 34 partid 113: Not using a routehint
lightningd-1 2024-03-28T07:52:46.862Z INFO    plugin-pay: cmd 34 partid 113: No path found
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   plugin-pay: cmd 34 partid 114: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.862Z INFO    plugin-pay: cmd 34 partid 114: No path found
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   plugin-pay: cmd 34 partid 115: Not using a routehint
lightningd-1 2024-03-28T07:52:46.862Z INFO    plugin-pay: cmd 34 partid 115: No path found
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   plugin-pay: cmd 34 partid 116: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.862Z INFO    plugin-pay: cmd 34 partid 116: No path found
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Sending master 1021
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: got commitsig 4: feerate 7505, blockheight: 0, 0 added, 0 fulfilled, 0 failed, 2 changed. 0 splice commitments.
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: HTLC out 2 RCVD_ADD_REVOCATION->RCVD_ADD_ACK_COMMIT
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: HTLC out 3 RCVD_ADD_REVOCATION->RCVD_ADD_ACK_COMMIT
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: HTLC out 2 RCVD_ADD_ACK_COMMIT->SENT_ADD_ACK_REVOCATION
lightningd-1 2024-03-28T07:52:46.862Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: HTLC out 3 RCVD_ADD_ACK_COMMIT->SENT_ADD_ACK_REVOCATION
lightningd-1 2024-03-28T07:52:46.863Z DEBUG   plugin-pay: cmd 34 partid 117: Not using a routehint
lightningd-1 2024-03-28T07:52:46.863Z INFO    plugin-pay: cmd 34 partid 117: No path found
lightningd-1 2024-03-28T07:52:46.863Z DEBUG   plugin-pay: cmd 34 partid 118: Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6
lightningd-1 2024-03-28T07:52:46.863Z INFO    plugin-pay: cmd 34 partid 118: No path found
lightningd-1 2024-03-28T07:52:46.863Z DEBUG   plugin-pay: cmd 34 partid 119: Not using a routehint
lightningd-1 2024-03-28T07:52:46.863Z INFO    plugin-pay: cmd 34 partid 119: No path found
JssDWt commented 7 months ago

Updated the test to remove some unnecessary noise.

This test appears to fail when the payment size is above half the available channel capacity. So probably the issue here is that the htlc from the first attempt is not yet removed from the channel. When the second attempt is tried, we won't find a route because we don't have the available balance for the payment ourselves. The payment is then split, but split combinations lead to the same balance issues (maybe amplify them?)

JssDWt commented 6 months ago

I haven't yet found the cause, but this stood out to me.

Some logs say Not using a routehint Some logs say Using routehint 0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de199 (111x1x0) cltv_delta=6

Indicating that some parts are not using the the same flow as other parts.

I think the trickiness here is somewhere in the payment modifiers. Trying to get my head around that.

Lagrang3 commented 6 months ago

Would it make a difference if dest opens a channel with randomnode instead of the other way around and then depleting it?

JssDWt commented 6 months ago

Would it make a difference if dest opens a channel with randomnode instead of the other way around and then depleting it?

That makes sense. Made that channel the other way around, and also got rid of the LSP. So it's a 4 nodes setup now. I made the sender wait for the proper gossip.

So it doesn't make a difference, the test still fails.

JssDWt commented 6 months ago

On the first payment attempt the estimated capacity for the sender's local channel is 175320000msat On the second payment attempt the estimated capacity is 75319800msat (less than the amount we're trying to send)

This makes sense, because the local_channel_hints payment modifier will call listpeerchannels and update the local channel hints here https://github.com/ElementsProject/lightning/blob/4b8d2617bbbc678a67376b32332804b66a691034/plugins/libplugin-pay.c#L2468-L2470.

The htlc from the first payment attempt is not yet removed from the channel, so that balance is unusable. This also explains why this is a race, sometimes the test succeeds and sometimes it doesn't. If the htlc is removed from the channel, the payment will succeed. If not, the payment will fail due to insufficient balance.

My suggestion would be to wait a little bit after receiving a payment failure, to make sure the htlc is completely removed from the channel. And only then initiate new payment attempts. Note that if the local balance is an issue, it would definitely make sense to either fail fast and not try any split payments afterwards, or wait for the htlc to be removed from the channel.

Note that this problem may be amplified on nodes that have slow signers, like on greenlight.

JssDWt commented 6 months ago

Insight:

The estimated capacity is actually added back to the channel hint in time here https://github.com/ElementsProject/lightning/blob/4b8d2617bbbc678a67376b32332804b66a691034/plugins/libplugin-pay.c#L586-L589

When the estimated capacity is checked here later however, it sees the old value https://github.com/ElementsProject/lightning/blob/4b8d2617bbbc678a67376b32332804b66a691034/plugins/libplugin-pay.c#L697

I checked the pointers to the channel_hints array for the root payment on both places. When the amount is substracted from the channel hint I'm seeing pointer A When the amount is added back to the channel hint I'm seeing pointer B When we're doing the pathfinding I'm seeing pointer A again.

So the amount was added back to a copy of the channel_hints array?