ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.81k stars 888 forks source link

MPP Should Have Single Global Budget #4104

Open ZmnSCPxj opened 3 years ago

ZmnSCPxj commented 3 years ago

Thinking about #3863, and experimenting with a badly-connected node, suggests to me that we badly mismanage the given pay budget.

I made some large payments of about a millibitcoin or so (large relative to the badly-connected node, it has barely 10 millibitcoins locked up to nodes that turn out to themselves be fairly badly connected when I do some manual 2-hop sendpay probes from the badly-connected node) and when giving maxfeepercent of 1% or 2% it fails, but afterwards with doing 5% it succeeds.

Yet when I look at the listpays report it claims to have sent out only around ~0.6% more than the requested amount, i.e. the feepercent is only 0.6%. In principle the payment solution it found at maxfeepercent=5 should have been discoverable at maxfeepercent=1 or 2. Either listpays is wrong, or we are badly mismanging our fee budget.

I think we should instead make a single budget for a tree of payments, rather than the "budget-stealing" idea suggested in #3863.

Basically:

The above should basically allocate budgets to each sub-payment as "tightly" as possible, and means that even on my badly-connected node, it should be able to acquire payment solutions of 0.6% with just maxfeepercent=2 or even maxfeepercent=1.

To be clear, I am not working on this for now, so anyone who is willing to poke into plugins/libplugin-pay.c can go try to implement the above. Not sure if this is a good first issue, but hey, just try, mkay?

cdecker commented 3 years ago

While I see the rationale behind the proposal I don't think it actually improves our success rate, and may in fact be detrimental, leading to payments not completing due to a one-sided assignment of fees to a partial payment, leaving other parts with insufficient budget.

I'm convinced that we need to make sure that each attempt/part is assigned the minimal fee budget possible. If we do not do this, we may end up with an early part using the majority of the fee budget, leaving little to no fees for other parts. The payment as a whole would be doomed, since the fee-using part will only get revoked by the MPP timeout, and outstanding parts don't stand a chance, because of insufficient budget left over.

It is for this reason that I proposed the fee-stealing solution, in which we have a heuristic that tells us if an attempt reached the destination, and therefore its remaining fee budget can be redistributed to the other attempts to boost their chances of success. Think of it as the counterproposal to this, in that we start from the most constrained version, loosening the constraints as we learn about the network. This ensures minimal interference between parts of a payment.

Your proposal on the other hand starts with the loosest constraints, tightening them as we go along, and potentially leading us into a dead-end that we can't escape from. This in turn is a direct result of parts sharing a fee budget.

ZmnSCPxj commented 3 years ago

My mental model is that earlier sub-parts of the payment are more likely to try more direct routes due to having less countermanding information that the direct routes are not passable. Direct routes are those that have lower fees relative to the size of the subpayment. Thus, it should also have the behavior that they start out with lower fees in general, then move on to higher and higher fee-routes, because that is how getroute works, it always gives the shortest route available for the given exclusions, and there are fewer exclusions earlier than later.

In particular, from my observation from my badly-connected node: most of the time the budget allocation to each sub-payment you fire out is almost entirely not used. Many of the sub-payments that reach the destination ultimately take up a tiny amount of their allocated budget used in fees, while later, smaller sub-payments are strapped for fees and are failed due to lack of fees anyway.

With a single global budget, the earlier sub-payments that do reach the destination deduct only a small fee, because they tend to be on the shorter and cheaper routes, but later payments have to contend with a network that already has funds locked up on those routes. So this just dynamically gives the earlier payments with more direct routes less fee budget and the later routes (that now have to live in a network with some payments locked up by your earlier sub-payments) get more fee budget.

Thus, while your theory on budget constraints may seem compelling, I am unconvinced that it actually holds up to how the network works. Indeed, the necessity of stealing budget suggests that the initial fee allocation is just wrong --- if your initial allocation of the fee budget had been correct in the first place, stealing budget would not even be necessary.

Your approach is like allocating memory directly to each application in proportion to the size of its executable file, instead of just allocating memory dynamically when the application actually runs. Then you have to fix it up later when one of the applications turns out to take up a lot of memory relative to its code, and steal some unused memory allocated for another application that ended up not using it. I am unconvinced that is the better way to allocate resources.

ZmnSCPxj commented 3 years ago

So let us simulate.

Suppose we have a network where there are three viable routes. One costs 1 msat, the next costs 2 msat, the third costs 3 msat. However, each of the routes has limited capacity, and let us suppose that it can take only exactly one sub-payment.

Now, let us suppose that our presplitter splits the payment into three equal payments, and we have a fee budget of exactly 6 msat.

Under your strategy of "split the budget and steal unused budget later". the three sub-payments will each get 2 msat of budget. The first one tried goes through the cheapest 1msat route fine, with 1 msat of leftover fee budget. The second one tries the cheapest route again, but fails because its capacity was already consumed by the first sub-payment. It tries the second-cheapest route of 2msat, and gets it with exactly the fee budget. The third sub-payment tries the earlier routes as well, but again fails due to capacity limits. Its remaining viable path takes 3 msat, which is 1 msat higher than its preallocated 2msat budget. So it has to go steal budget. It steals the 1 msat budget from the first sub-payment, and pushes through.

Under my strategy of "global budget", the three sub-payments initially get sent over the chepaest 1msat route, and get allocated 1 msat each, leaving 3 msat in the global budget. However, only one of the sub-payments will succeed due to capacity constraints on the network, so the other two fail and return their 1msat budgets back to the global budget, bringing it up to 5 msat. The sub-payments then get tried on the longer 2-msat route, getting 2 msat budget each, bringing the global budget to a measly 1 msat. They are fired on the 2-msat route, but one of them returns due to insufficient capacity. That one returns the 2msat budget back to the global budget which goes back up to 3 msat, then it tries 3-msat route, getting the 3 msat left in the global budget. Then the entire payment pushes through.

Now, the key insight you have to have is this:

Then we can make this optimization:

This is because budget-stealing may need to steal from more than one previous sub-payment as well.

However, having a single budget that represents "the amount of budget all previous fired sub-payments are not using" is simpler.

Basically --- with budget-stealing, every sub-payment has to maintain a separate field for "this is the pat of the allocated budget I am not using". Instead of having lots of little pools that we have to handle, just make one big pool meaning "this is the part of the total allocated budget that none of us are currently using". The sum of the individual budget-stealable amounts, and the single global budget I am proposing, will be exactly the same anyway, so why complicate the solution unnecessarily? Just do not split up the unused part of the budget into tiny pools that you have to virtually re-merge anyway when you steal, then split back again, like, why would you make more work for yourself?

So yes, I think this is just substantially the same as your budget-stealing idea, except more efficient due to being simpler.

ZmnSCPxj commented 3 years ago

Even with path randomization, where you might speculatively avoid the shortest path in order to mislead evil surveillors, does not require per-part budgets. For every part under consideration, you could select any path at random whose fee is below global_budget * sub_payment_amount / total_amount, then before firing the payment along that path, deduct its fee from the global budget. It will still behave approximately the same (esepcially if you bias towards selecting shorter routes, which you want to do as a general policy to improve your chances of keeping within the total fee budget), but with a much simpler single budget than managing lots of tiny budgets that you have to periodically re-merge when you inevitably have to steal anyway, and then re-split back after stealing.