Closed SimonVrouwe closed 6 years ago
First of all: excellent bug report, you clearly invested a lot of time into this.
I think all of those peers are lnd
nodes (their default color is black), and I'm not exactly sure how they do their fee estimation (maybe @roasbeef can help here), that being said this could happen even if we are using the same estimator, and it is jumpy, or we run the same iterator with slightly different information about mempool and similar, so it's really a hard thing to fix, since we need to enforce some reasonable range of fees.
We could decrease the impact by switching to a 10x range, but even that could result in closures, but it'd make it a lot less likely:
The above plot shows the max/min fee ratio over 1 hour rolling windows and the 5x and 10x cutoff points. This is using a single estimator (which can swing wildly as well), and is sort of a worst-case scenario, i.e., one node uses the min fee estimation for that hour and the other uses the max fee estimation for that hour.
What do you think @rustyrussell, should we just increase for now until we have a good solution?
Actually, the error message in the log says received ERROR
Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel
which means that it was the remote peer that did not agree with theupdate_fee
message send by my node. Which makes more sense as it was my node that initiated/funded all 6 channels and was responsible for the fee.
The node not responsible for paying the Bitcoin fee:
MUST NOT send update_fee.
So it was then my bitcoind
that was so fast that all other 6 peers couldn't keep up with the new fee estimate. I assume that the last part of the ERROR message, i.e. update_fee 3047 outside range 253-2820
was part of the error
message send by the remote.
Would you care @cdecker to share the source/code of that min/max fee ratio chart?
And isn't the estimatesmartfee
polled every 30s, which could trigger an update_fee
message, instead of every hour?
Sure thing, here it is https://gist.github.com/cdecker/39186d8c041924806773f4bacd0aefdd
I used the 1 hour range mainly to exacerbate the problem a bit (and because statoshi only has 10 minute snapshots). We could probably smoothen out the effect by using an exponential moving average, so that we aren't as jittery when we don't have a block for some time.
As for the remote end sending those errors, that seems strange since the error message originates from c-lightning's code:
So either I misattributed your peers to be lnd
, or the error message was reflected back to us.
Well at least one of the remotes claims to be a c-lightning 02977901c53b5299c7641acf11d49a46f0957e9e3f8191e6171d1222486d872317
Great, that settles the problem I had figuring out where it comes from :-)
We just discussed that we'll raise our acceptable fee range and then reduce our sensitivity by using an exponential moving average to flatten the fee transitions.
Our default color is "#3399FF
. Atm, we sample fees each block, and if fees move 10% or more, then we'll update. We're moving soon to a system which decouples synchronous updates across all channels, and instead we'll allow each node to randomly sample themselves.
Perhaps it would be beneficial if users could (optionally) set the acceptable fee range.
For example by using --commit-fee-min=<percent>
and --commit-fee-max=<percent>
to calculate the range.
For example by using --commit-fee-min=
and --commit-fee-max= to calculate the range.
On what would that percentage be computed? Total value in the channel or the funder's value?
Well from the help message I guessed/assumed it was as a percentage of the fee (estimate), similar to
--commit-fee=<percent> Percentage of fee to request for their commitment (default: 500)
Are you referring to a situation where a party cannot afford the fee rate? What happens then?
Somewhat off-topic, but I was also thinking about the (hypothetical) situation where a sharp rise in fee rate could cause a cascade of channel closures. The massive broadcasting of high fee-paying closing tx's would further increase fee rates, causing more channel to close etc.
Issue and Steps to Reproduce
I was recklessly testing c-lightning (build May 17) on mainnet and had 6 channels open for about a week and then they where all abruptly closed by my node. No fund where lost.
All 6 channels closed with similar errors
update_fee 3047 outside range 253-2820
:I think it was caused by the sudden 6x increase in feerate estimate, as shown in chart below taken from statoshi.info. Note that the timestamp in the chart is in timezone EEST (GMT+3) and the timestamp in log is GMT.
As I understand, my node polls bitcoind's
estimatesmartfee 2
every 30s and then updates themin_feerate
andmax_feerate
for all my channels according to the new estimate. So could it simply be that the p2pupdate_fee
message was received (for all 6 channels) before my node had a change to updated its own fee estimate?Also not sure how rare a 6x increase in fee estimate is within such a short (30s) time period. I should also mention that the network cable was unplugged for a short period (max 30min) earlier that day, but I can't remember exactly if this coincides with the channel closure.
A quick solution I had in mind was simply raising
max_fee
with the command line option--commit-fee-max=<percent>
to a higher percentage. But in the code I found that themax_fee
is actually hard coded to be 5x the fee estimate https://github.com/ElementsProject/lightning/blob/56046d470b24d52c959cc2c558040720c907fd49/lightningd/peer_control.c#L191That factor of 5x also makes more sense for the fee
range 253-2820
(which is floor - 5 x 564) shown in the log. So are these options--commit-fee-min=<percent>
and--commit-fee-max=<percent>
actually usable and do they mean what I think they do?Before shutting down the node I recorded some 'getlog io' to a file and I found, for example:
getinfo
outputwith
Bitcoin Core RPC client version v0.16.0
onLinux HPLaptop 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux