ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.87k stars 907 forks source link

pay: most payments fail #6793

Closed kilrau closed 5 months ago

kilrau commented 1 year ago

Issue and Steps to Reproduce

Most payments (as small as 10000 sats) we are trying to send via CLN's pay gRPC are failing with Reason: Ran out of routes to try after 18 attempts: see paystatus which is why we are not using CLN for sending payments in production right now, only receiving. Only opening this issue now because we wanted to make sure it's not sth incredibly stupid on our side. Couldn't find anything obvious.

Observations:

Reason: Ran out of routes to try after 48 attempts: see paystatus

PaymentPending PaymentFailed Reason: Ran out of routes to try after 61 attempts: see paystatus

PaymentPending PaymentFailed Reason: Ran out of routes to try after 18 attempts: see paystatus

- Smaller manual payments via `lightning-cli` sometimes work fine, whereas they mostly fail on gRPC (just a hunch)
- Not all gRPC payments fail, some go through, but it's sth like >90% failure rate
- Routing and all other things work fine
- Our CLN node definitely is large and well connected enough by now to reliably send the payments we are trying to send
- Nothing helpful in the logs

We are absolutely clueless on what's going on and if it's simply the amount of mpp shards that make things fail. Any hints appreciated.

### `getinfo` output

{ "id": "02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018", "alias": "Boltz|CLN", "color": "ff9800", "num_peers": 88, "num_pending_channels": 0, "num_active_channels": 80, "num_inactive_channels": 1, "address": [ { "type": "ipv4", "address": "45.86.229.190", "port": 9736 }, { "type": "ipv6", "address": "2a10:1fc0:3::270:a9dc", "port": 9736 }, { "type": "torv3", "address": "oo5tkbbpgnqjopdjxepyfavx3yemtylgzul67s7zzzxfeeqpde6yr7yd.onion", "port": 9736 } ], "binding": [ { "type": "ipv4", "address": "0.0.0.0", "port": 9736 }, { "type": "ipv6", "address": "::1", "port": 9736 } ], "version": "v23.08.1", "blockheight": 812772, "network": "bitcoin", "fees_collected_msat": 80795360, "lightning-dir": "/home/bitcoin/.lightning/bitcoin", "our_features": { "init": "08a0000a8a69a2", "node": "88a0000a8a69a2", "channel": "", "invoice": "02000002024100" } }


https://amboss.space/node/02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018
kilrau commented 1 year ago

Adding questions @michael1011 previously asked on discord:

kingonly commented 1 year ago

Are you sending to nodes with private channels?

kilrau commented 1 year ago

It's pretty big nodes we tried sending to so far so high probability that these have private channels, yes.

kingonly commented 1 year ago

Does the invoice has routing hints is basically what I'm asking.

kilrau commented 1 year ago

Yes almost all do. Will do some test using invoices without routing hints.

kilrau commented 1 year ago

Update: no difference when paying 100k sats invoices without routing hints. image

Again the destination is a big node and we are definitely well connected enough to pay 100k sats invoices.

kilrau commented 1 year ago

So this is very interesting:

Btw for these tests I am always using (for my standards) a very high routing fee limit of 0.5% / 5000ppm to make sure this is not causing any failures and obviously always the same destination.

kilrau commented 1 year ago

Another test that simply confirms that there is no difference between lightning-cli and grpc (as expected):

Again same destination, same 5000 ppm routing fee limit for both.

kilrau commented 1 year ago

Test if routing hints play a role: they don't

Same behavior with invoices that contain routing hints:

Again same destination, same 5000 ppm routing fee limit for both.

kilrau commented 1 year ago

Ran similar tests today and .. it doesn't reproduce as reliably as before. Sometimes the first 100k payment or even larger goes through :/

Same destination as before, same 5000 ppm routing fee limit as before.

Any ideas what else I could try?

vincenzopalazzo commented 1 year ago

Any ideas what else I could try?

When the payment fails it is possible that we choose the wrong path and we keep choosing the wrong one over and over again.

Or is it possible that some channels as disabled? e.g: On chain fee disagreement?

From your messages, I am not able to track down what is happening, but I think it is also difficult from the pay command to track what is going for you due that we hide the complexity under the hood

kilrau commented 1 year ago

choose the wrong path and we keep choosing the wrong one over and over again

I would love to follow exactly which routes are being tried and see with which errors they fail but I simply couldn't find how. How can I?

kilrau commented 1 year ago

How one should be able to follow the routes pay is trying, quoting @cdecker from https://discord.com/channels/899980449231814676/899989708065632276/1167497746538840154:

The pay plugin should be logging the path (sequence of scids) before attempting it. Grep for the partid and you can follow the payment relatively easily ( and then you can dice into the adjacent lines when you see something that you'd like to investigate further)

But we decided that for us pay doesn't behave close enough to how we'd expect it to and started mpay, our own payment plugin that simplifies stuff quite a bit (doesn't split payments for now) and remembers routes that worked well for past payments (similar to lnd's mission control). I can see other's struggling with pay too, but I'm happy to close here since I won't be debugging this any further.

kilrau commented 11 months ago

Just to leave this final comment for anyone from the CLN team picking this up: pay in its current form is absolutely unreliable, either give this a proper look or switch it out for something else that ships by default with CLN. Happy to help and share experiences we made with mpay :v:

vincenzopalazzo commented 11 months ago

@kilrau maybe the reny-pay will be a good target for this, maybe @Lagrang3 is the correct person to explore this path

kilrau commented 11 months ago

Sure, happy to help and test but we will not be leading this effort. We'll work on our own mpay. The comment was just meant to get attention from the team that pay has serious issues. We even switched our regtest away from pay since it doesn't reliably use a direct channel :/

Lagrang3 commented 11 months ago

@kilrau maybe the reny-pay will be a good target for this, maybe @Lagrang3 is the correct person to explore this path

Thanks for bringing this up. @kilrau, I am working on renepay a new payment plugin that doesn't produce arbitrary MPPs but instead uses @renepickhardt's algorithm; there is a short description of how this works in the documentation and the original PR #6376. Unfortunately, the plugin right now is somewhat unstable. But I am working on it to fix the most evident bugs before release 24.02.

s373nZ commented 8 months ago

I just installed Boltz Client to test it out today and ran into this. Here is some information in case it's helpful. It is worth noting that my node has a channel open to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2. @kilrau has informed on Discord that this target node is LND and is usually setup to handle swaps > 1m sats, while there is a CLN node that handles swaps for smaller amounts. My swap is for 100k sats, so it should target the smaller node.

Observations

  1. In the attached paystatus JSON log, the erring_node is this LND node in the last 3/4 attempts
  2. erring_channel is different each time
  3. Only four attempts where made, although the strategy message on the last attempt indicates "Still have 7 attempts left"

Let me know if I can provide more information to help diagnose.

boltz@minibolt:/data/boltz$ boltzcli createreverseswap --currency LBTC 100000
You will receive the withdrawal to the specified onchain address
The fees for this service are:
  - Service fee: 0.2%
  - Boltz lockup fee: 276 satoshis
  - Claim fee: 143 satoshis

? Do you want to continue? Yes
Swap Status: swap.created
Swap ID: 7TSb3cusMVrs

Swap Status: swap.created
Error: rpc error: code = Unknown desc = Error calling method Pay: RpcError { code: Some(210), message: "Ran out of routes to try after 4 attempts: see `paystatus`", data: None }

boltz-paystatus-failed.json

cdecker commented 8 months ago

Thanks for keeping the information flowing. Debugging these is always hard because you wont hear back once you fixed it, so that's a big help.

Did you guys have a chance to test again now that #7188 is merged in master to see if it (partially) addresses the issue. @JssDWt found a really well hidden bug that my accounting code had that would result in the channel capacity estimates getting lower and lower as attempts go on.

vincenzopalazzo commented 8 months ago

The patch from @JssDWt is included inside the point release? https://github.com/ElementsProject/lightning/releases/tag/v24.02.2

I did not check the diff from the previous one, but if it is included we should advertize inside the release description, right?

s373nZ commented 8 months ago

@cdecker FWIW I upgraded to CLN v24.02.2 and boltzd v2.0.2-0b4c946 then ran the exact same command as my previous example, but with the same result. Still an issue for me, if indeed my use case applies to this ticket. :(

urza commented 6 months ago

Some more anecdotal evidence from me. Default pay failing lately a lot for me. Installed mpay (after fighting with python a bit:) and can confirm mpay can pay what the default pay can't.

urza commented 5 months ago

mpay is awesome, can we have it for bolt12 invoices as well? 😅

https://github.com/BoltzExchange/boltz-backend/issues/565

cdecker commented 5 months ago

Closing in favor of #7180