ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.79k stars 885 forks source link

Feature Engineering in renepay seems a bit unstructured and arbitrary #6852

Open renepickhardt opened 8 months ago

renepickhardt commented 8 months ago

Generally in renepay one needs to define a linear unit_cost. This can consist of one or several features (which in the best case are also already linear unit_costs). (The unit_costs can be seen as slopes.)

The documentation and API of the code suggest that currently there are 2 features in renepay:

  1. Uncertainty cost (named: s64 pcost = linear_network->arc_prob_cost[arc.idx])
  2. Routing cost (named: fcost = linear_network->arc_fee_cost[arc.idx])

The naming was taken from the following function where the two features are being combined with some parameter MU https://github.com/ElementsProject/lightning/blob/02ca226f88bf91c1fc3f654d8faa5aa178f75b96/plugins/renepay/mcf.c#L532

While the linearized uncertainty_unit_cost seems to be computed as one would expect (neglecting the open issue of the uncertainty cost often being rounded down to 0 during integer casting) it turns out that the fee cost is computed rather strangely:

https://github.com/ElementsProject/lightning/blob/02ca226f88bf91c1fc3f654d8faa5aa178f75b96/plugins/renepay/flow.c#L866

We see the fee_unit_cost consists of the ppm (or fee_rate) and some penalty for the base fee (while I am not sure if this is the best way to approximate channels with base fee I certainly think one can do it this way). However the third term in the sum is related to the CLTV of the involved channels: delay*delay_feefactor.

It is my understanding that historically there was the delay coded into the cost function (weights) of dijkstra as people thought they would rather prefer to send funds across channels with a tiny higher fee but lower total CLTV so that in case of onchain failures the funds would not be locked too long. While that kind of reasoning makes some sense to me I believe nowadays with all those pinning attacks circling around the trend is to prefer channels with a higher CLTV deltas? in that case the feature should actually change to somthing like 1/dalay or max_delay - delay to achieve one's goals. In any case it may also indicate that adding the delay into the cost function is not the best idea?

in any case the code discusses at length in comments how to price uncertainty in sats but here the cost for the CLTV just drops. Also it leas to a final unit_cost (or slope) of the linear cost function:

unit_cost = slope = mu*(pfee + bfee* base_fee_penalty+ delay*delay_feefactor) + (MU_MAX-1-mu)*pcost

Despite combine_cost_function suggesting we have only 2 features we do infact have 4 features (three of which are already combined into the fcost):

  1. pfee(corresponding to the ppm of a channel)
  2. bfee (corresponding to the base_fee of a channel)
  3. delay (corresponding to the cltv_delta of a channel)
  4. pcost (corresponding to the uncertainty cost of the arc)

Problems / Question

The four identified features are combined with some linear combination and weights. Some of the weights are learnt while plugin execution, some others are given to the user as command line parameters. the weight mu is even multiplied with the other weights making the entire feature engineering process a bit strange and probably yield unpredictable results.

Proposal / Solution

  1. I think it would be best to have have an API in which there exists a list or vectore of seperate features for eacharc.
  2. Add some code to normalize the features
  3. Add a function that combines the features for example as a linear combination as a weighted sum.
  4. The weights can either be learnt ad-hoc (as is currently done with mu) or over the lifespan of the lightning node from statistics or could be passed by users (for example to overwrite what the node thinks is best). In this way future features like latency of a channel could be easily added.

Removing cltv as a feature

I would argue that cltv should not be part of the cost function in payment planning as it produces behavior that seems hard to predict / understand. If the cltv is important I suggest to set an acceptable cltv range and skip all channels in gossmap that do not satisfy the acceptable cltv ranges while feeding the linear_network and residual_network data structures.

Lagrang3 commented 8 months ago

I find this feature separation API very interesting. I think we need to make a distinction between the property of the channel (eg. base_fee, delay), the quantities that are targeted for optimization (eg. fee, success probability), and the cost function for each of the targeted quantities (eg. uncertainty cost, the fee itself is a cost function for the fee). Which one would you call feature?

I recall that -log of the success probability was a good choice for a cost function because it allowed to write the total cost as a sum of over the arcs in the route; this is just a reminder that the target quantity and the cost function are not always the same.