Blockstream / greenlight

Build apps using self-custodial lightning nodes in the cloud
https://blockstream.github.io/greenlight/getting-started/
MIT License
109 stars 27 forks source link

Slow payments using Breez SDK #502

Open afterburn opened 3 weeks ago

afterburn commented 3 weeks ago

We're using Breez SDK and while testing paying lightning invoices we experience anywhere from 30s to multiple minutes of delay before a payment completes fully where the latter is an unacceptable UX to us. I've tried to get to the bottom of this with the Breez team but they've told me to open an issue here, so here it is. Attached you'll find two logs, one for a relatively fast payment and one for what we experience to be a slow payment.

breez-payment-fast.log breez-payment-slow.log

Any help/guidance as to what might be causing this and how to resolve this would be greatly appreciated.

kingonly commented 3 weeks ago

There's 5 minutes log in the slow one...

What's the node pubkey? What the payment timestamp? What the payment destination?

afterburn commented 3 weeks ago

@kingonly Okay, attaching a new log file. This time I took note of the details you requested.

breez.log

Node Pubkey/Id

02d3cbfaf44dbb8636bd12354f042b19015ecd65162ea5bd1d4aeb99bb16c7ad72

Payment Timestamp

20:10:45

Payment Destination

lnbc1u1pnvfhqhpp5qfz0se9lfqakv74wpalty00n85ezmv45tnve33r029zep477ch9qcqpjsp5gp5dc08p92hmrn3h9r2dx4n8g96cn5rmrvhjjtcwquvnce23jw2q9q7sqqqqqqqqqqqqqqqqqqqsqqqqqysgqdqqmqz9gxqyjw5qrzjqwryaup9lh50kkranzgcdnn2fgvx390wgj5jd07rwr3vxeje0glcll6r2vqt00symuqqqqlgqqqqqeqqjq7ngrt8ftkweytv34dnfpt55nlyj6la2m26swerrguzxvpkk8pmgkxqlmdvm9ed0dxcart8lxvfc8g99z3f6cv6dpaerymxdlgh422lgpxkpxr9
kingonly commented 3 weeks ago

This is a payment to Phoenix. Are you testing it on the same device? Can you please send to another device? Sending between two non custodial wallets on the same device can explain the latency.

afterburn commented 3 weeks ago

@kingonly I used two separate devices, a physical Samsung phone running Phoenix on android and the other is an iOS simulator running our app. They were on the same network.

kingonly commented 3 weeks ago

@cdecker could you help me debug this payment? I can pay Phoenix quickly in my environment fwiw

afterburn commented 3 weeks ago

breez.log

FWIW here is another log where we pay an invoice to WoS from our app, two separate devices as before. Same pubkey/node id. Payment timestamp 22:33:51. Payment destination lnbc10u1pnvflvqpp5quwkpcuaw5vm98gpgqn0tgxksrne4rspq4689suctwha5xcmyxasdqqcqzzsxqyz5vqsp59stlvtzazvx58u99rygxezqsqjdwjlcc747xxhc0ztwtr3jj89xq9qxpqysgqp24a25mhgvhu3mvrzq5w9pzxyxuf0yn77rf4n3td6petg2p00k9n4cum3sfyahqqjqxas6fckfcv2qljygwxkyyzhsl0h3z5n3yyegcq5a80ay.

cdecker commented 3 weeks ago

For future reference the sessions are:

Looking at the slow payments only here

7231591378063360

Timeline

The first successful payment

The second payment:

So as you can see we get unlucky and try a channel 836841x2426x1, which has insufficient capacity. This attempt alone cost 19s to execute. The problem here is twofold:

The remaining payments in the logs are aggregated as this:

ParserContext(
│   payments=[
│   │   Payment(
│   │   │   parts=[Part(partid='0', route=Route(hops=['844981x3824x0', '840936x1057x0', '855917x1467x1', '16728915x47038x1247']), error=None, code=None, failchan=None, failnode=None, cltv='857836', amount_msat='1644164', htlc_id='169')]
│   │   ),
│   │   Payment(
│   │   │   parts=[
│   │   │   │   Part(
│   │   │   │   │   partid='0',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '836841x2426x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2426x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858209',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='170'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='1',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '836841x2419x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2419x1',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858209',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='171'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='2',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '838707x2928x1', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838707x2928x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858273',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='172'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='3',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '843228x418x0', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='843228x418x0',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858273',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='173'
│   │   │   │   )
│   │   │   ]
│   │   ),
│   │   Payment(
│   │   │   parts=[Part(partid='0', route=Route(hops=['844981x3824x0', '840936x1057x0', '855917x1467x1', '16728915x47038x1247']), error=None, code=None, failchan=None, failnode=None, cltv='857837', amount_msat='1643164', htlc_id='174')]
│   │   ),
│   │   Payment(
│   │   │   parts=[
│   │   │   │   Part(
│   │   │   │   │   partid='0',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '836841x2426x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2426x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858209',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='175'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='1',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '836841x2419x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2419x1',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858209',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='176'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='2',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '838707x2928x1', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838707x2928x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858273',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='177'
│   │   │   │   )
│   │   │   ]
│   │   ),
│   │   Payment(
│   │   │   parts=[Part(partid='0', route=Route(hops=['844981x3824x0', '840936x1057x0', '855917x1467x1', '16728915x47038x1247']), error=None, code=None, failchan=None, failnode=None, cltv='857838', amount_msat='1501150', htlc_id='178')]
│   │   ),
│   │   Payment(
│   │   │   parts=[
│   │   │   │   Part(
│   │   │   │   │   partid='0',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '836841x2426x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2426x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858210',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='179'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='1',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '836841x2419x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2419x1',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858210',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='180'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='2',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '838707x2928x1', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838707x2928x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='181'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='3',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '843228x418x0', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='843228x418x0',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='182'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='4',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '838710x1842x0', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838710x1842x0',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='183'
│   │   │   │   )
│   │   │   ]
│   │   ),
│   │   Payment(
│   │   │   parts=[Part(partid='0', route=Route(hops=['844981x3824x0', '840936x1057x0', '855917x1467x1', '16728915x47038x1247']), error=None, code=None, failchan=None, failnode=None, cltv='857838', amount_msat='1645164', htlc_id='184')]
│   │   ),
│   │   Payment(
│   │   │   parts=[
│   │   │   │   Part(
│   │   │   │   │   partid='0',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '836841x2426x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2426x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858210',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='185'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='1',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2648x0', '836841x2419x1', '762578x410x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='836841x2419x1',
│   │   │   │   │   failnode='039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec465019c3',
│   │   │   │   │   cltv='858210',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='186'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='2',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '838707x2928x1', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error='WIRE_TEMPORARY_CHANNEL_FAILURE',
│   │   │   │   │   code='1007',
│   │   │   │   │   failchan='838707x2928x1',
│   │   │   │   │   failnode='02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924',
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='187'
│   │   │   │   ),
│   │   │   │   Part(
│   │   │   │   │   partid='3',
│   │   │   │   │   route=Route(hops=['844981x3824x0', '840936x1057x0', '856647x1703x0', '845136x1684x1', '854798x1031x6', '848375x2725x0', '843228x418x0', '850207x295x1', '852303x2834x0']),
│   │   │   │   │   error=None,
│   │   │   │   │   code=None,
│   │   │   │   │   failchan=None,
│   │   │   │   │   failnode=None,
│   │   │   │   │   cltv='858274',
│   │   │   │   │   amount_msat='6012',
│   │   │   │   │   htlc_id='188'
│   │   │   │   )
│   │   │   ]
│   │   )
│   ]
)

(This is the output of a log slicer that I'm building, since analyzing these logs is getting tricky, and rather time-consuming, apologies for the lack of timestamps, which would be useful here, but I pulled apart the first payment to give an idea on timings)

As you can see we're losing a lot of time talking to 02a98e8c590a1b5602049d6b21d8f4c8861970aa310762f42eae1b2be88372e924, a.k.a., that first failing hop in the second payment, costing us dozens of seconds. A correct scoring and remembering prior information would de-emphasize that node, and boost other, more successful one.

If desired I can analyse the other logs too, but I think this highlights the issue rather nicely.

kingonly commented 3 weeks ago

@yaslama please take a look at our channels with this node. Perhaps we should close our channels, although it seems like large routing node.

Meanwhile, we've opened direct channels to Acinq to mitigate this issue. Please report if this problem persists.

afterburn commented 2 weeks ago

@cdecker @kingonly is there any ETA on the change @cdecker is suggesting?

yaslama commented 2 weeks ago

We have new channels to Acinq and it should be sufficient to mitigate the issue. But i also closed the channel to 02a98e8c

afterburn commented 2 weeks ago

We are still experiencing times well above 15s, anywhere from 30s - 1m30s both to non-custodial and custodial recipients. What else can be done to get this resolved? Happy to assist wherever possible.

kingonly commented 2 weeks ago

Provide the details of a specific payment and we'll take a look

dallasirushing commented 2 weeks ago

Hey guys

Summary of the problem:

I'm concerned that this discussion has gotten to a point where the resolution was to fix a route with Phoenix, but I can't see how this solves the "general slowness" that we're facing.

kingonly commented 2 weeks ago

The problem is that we're not aware of a general issue. I don't see any other way than debugging specific payments. Please share more info in the issue re specific payments so we could follow up with the GL team.

We're also working on a trampoline payments implementation that might mitigate this issue but we're still a few weeks away from delivering it.

cdecker commented 2 weeks ago

Apologies for not providing the requested ETA for general enhancements of payments. We were heads down implementing it, and we're getting close with CLN#7494, but sadly we missed the release window for v24.08. As soon as we've finalized the PR we will start back porting it onto v24.02 which corresponds to the latest Breez SDK, and from there on we can fine tune in the servers quickly. So a couple of weeks at most for general improvements. In the meantime the trampoline progress is looking promising, and that'll alleviate the issue maybe even a bit quicker.