Sudden 6x feerate increase closed all my channels

SimonVrouwe commented 6 years ago

Issue and Steps to Reproduce

I was recklessly testing c-lightning (build May 17) on mainnet and had 6 channels open for about a week and then they where all abruptly closed by my node. No fund where lost.

All 6 channels closed with similar errors update_fee 3047 outside range 253-2820:

2018-06-05T13:15:30.050Z lightningd(21146): 03db39d2f0bff36f69969807f54829e82073afb88f0fd2996e61d245a797169a28 chan #4: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 0b4e0d9681d5b81ac790f4cd50e503c9cb2c20f241c9be55abbeecaaa28f8531: update_fee 3047 outside range 253-2820
2018-06-05T13:15:30.085Z lightningd(21146): 036fb07937b7282bedcbb017d05b75a435eb4b0fb7b2615f0d47e4ab0183182104 chan #6: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 80282bc85e9c76a1566dd80eb0357ed4a8f48d8ccf2a07fd384477f8e2a8e12e: update_fee 3047 outside range 261-2820
2018-06-05T13:15:30.119Z lightningd(21146): 02977901c53b5299c7641acf11d49a46f0957e9e3f8191e6171d1222486d872317 chan #7: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 38310e167b3b09364551e96c566009e0080ab6740de80bfde4b90e226e3fba42: update_fee 3047 outside range 253-2820
2018-06-05T13:15:30.139Z lightningd(21146): 02fd98ebd4cbedd1317d629d62d192cf943f0134c61464de48f1e232d861de4ef9 chan #1: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 777705fe3d3b9f0d07c915ba67c44026359ba325f77da1c6479fcb5bd5dcb959: update_fee 3047 outside range 253-2820
2018-06-05T13:15:30.220Z lightningd(21146): 032b31feacb6d645c9a83a015d2b52d549205ef515ddea9f0a6b8a4a657eefec3a chan #2: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 4219991b2e05cc2b76f41e9c49d37a49ca98194bc8f35f258133c08ab1f163e7: update_fee 3047 outside range 253-2820
2018-06-05T13:15:30.272Z lightningd(21146): 02a8bb290596e5cb990e0292ac9165fba869c3202c8d96594ce2b20bf5bf452ca0 chan #5: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel fb9b0d466aea139b57e28885e049ed523e1a5270801cc9969bdf81e1aa2f4e2e: update_fee 3047 outside range 261-2820

I think it was caused by the sudden 6x increase in feerate estimate, as shown in chart below taken from statoshi.info. Note that the timestamp in the chart is in timezone EEST (GMT+3) and the timestamp in log is GMT.

As I understand, my node polls bitcoind's estimatesmartfee 2 every 30s and then updates the min_feerate and max_feerate for all my channels according to the new estimate. So could it simply be that the p2p update_fee message was received (for all 6 channels) before my node had a change to updated its own fee estimate?

Also not sure how rare a 6x increase in fee estimate is within such a short (30s) time period. I should also mention that the network cable was unplugged for a short period (max 30min) earlier that day, but I can't remember exactly if this coincides with the channel closure.

A quick solution I had in mind was simply raising max_fee with the command line option --commit-fee-max=<percent> to a higher percentage. But in the code I found that the max_fee is actually hard coded to be 5x the fee estimate https://github.com/ElementsProject/lightning/blob/56046d470b24d52c959cc2c558040720c907fd49/lightningd/peer_control.c#L191

That factor of 5x also makes more sense for the fee range 253-2820 (which is floor - 5 x 564) shown in the log. So are these options --commit-fee-min=<percent> and --commit-fee-max=<percent> actually usable and do they mean what I think they do?

Before shutting down the node I recorded some 'getlog io' to a file and I found, for example:

{
      "type": "DEBUG", 
      "time": "877127.161889153", 
      "source": "lightningd(21146):", 
      "log": "Immediate feerate 3047 (was 564)"
    }, 
    {
      "type": "UNUSUAL", 
      "time": "877127.402645515", 
      "source": "lightningd(21146):", 
      "log": "03db39d2f0bff36f69969807f54829e82073afb88f0fd2996e61d245a797169a28 chan #4: Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel 0b4e0d9681d5b81ac790f4cd50e503c9cb2c20f241c9be55abbeecaaa28f8531: update_fee 3047 outside range 253-2820 (tx cee931a1127320b990493b5951e70e04282cf55fea2e2a0b20f66a6237fbdbaf)"
    }, 
    {
      "type": "DEBUG", 
      "time": "877127.406634044", 
      "source": "lightningd(21146):", 
      "log": "sendrawtransaction
      020000000001010b4e0d9681d5b81ac790f4cd50e503c9cb2c20f241c9be55abbeecaaa28f853100000000008df79e8001e8180600000000002200207617fe59158a81d26c94cc9c7711586d0392c91d681d40f9fc7b152893f6fd5b040048304502210087088179a514953529d39d5233db0d9baf9fc2fce98b844d9906c2d1ba5a19fd022010eb17d1c6d4df4b788db061f2864b0c7009ea9c6bf841ee41c27d7d55c6e87101483045022100caed706c8011ebd9543f6f4bc7eb4299d061c16e9686d6cebecc2dd80f7ae97a02204c26ecf5a18a19dfcec601a4447b990f0718cc1df144686592cc9f9806d8ff630147522102ceac180d988c74a77baaa40e0634a48c231eb928b0f1b8f18f8cbbe6634362052103f351d27004915eb2d62f1f6ba8979f58b0f66259ca3de2442a1b3201426d5d9152ae2b8ce320"
    }, 
    {
      "type": "DEBUG", 
      "time": "877127.435084373", 
      "source": "lightningd(21146):", 
      "log": "sendrawtx exit 0, gave cee931a1127320b990493b5951e70e04282cf55fea2e2a0b20f66a6237fbdbaf?"
    }

`getinfo` output

{
  "# version": "v0.5.2-2016-11-21-2778-gb28d71a9", 
  "rgb": "020b2f", 
  "alias": "LookBusy", 
  "pid-file": "lightningd-bitcoin.pid", 
  "log-level": "INFO", 
  "log-prefix": "lightningd(21146):", 
  "lightning-dir": "/home/simon/.lightning", 
  "rpc-file": "lightning-rpc", 
  "daemon": "false", 
  "ignore-fee-limits": false, 
  "watchtime-blocks": 144, 
  "max-locktime-blocks": 720, 
  "funding-confirms": 3, 
  "commit-fee-min": 200, 
  "commit-fee-max": 2000, 
  "commit-fee": 500, 
  "default-fee-rate": 40000, 
  "cltv-delta": 14, 
  "cltv-final": 8, 
  "commit-time": 10, 
  "fee-base": 1000, 
  "rescan": 15, 
  "fee-per-satoshi": 10, 
  "bind-addr": "127.0.0.1:9735", 
  "announce-addr": "78.62.31.196:9735", 
  "offline": "false", 
  "autolisten": true, 
  "network": "bitcoin", 
  "allow-deprecated-apis": true, 
  "autocleaninvoice-cycle": 0, 
  "autocleaninvoice-expired-by": 0, 
  "always-use-proxy": false
}

with Bitcoin Core RPC client version v0.16.0 on Linux HPLaptop 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux

cdecker commented 6 years ago

First of all: excellent bug report, you clearly invested a lot of time into this.

I think all of those peers are lnd nodes (their default color is black), and I'm not exactly sure how they do their fee estimation (maybe @roasbeef can help here), that being said this could happen even if we are using the same estimator, and it is jumpy, or we run the same iterator with slightly different information about mempool and similar, so it's really a hard thing to fix, since we need to enforce some reasonable range of fees.

We could decrease the impact by switching to a 10x range, but even that could result in closures, but it'd make it a lot less likely:

1hr-fee-ratios

The above plot shows the max/min fee ratio over 1 hour rolling windows and the 5x and 10x cutoff points. This is using a single estimator (which can swing wildly as well), and is sort of a worst-case scenario, i.e., one node uses the min fee estimation for that hour and the other uses the max fee estimation for that hour.

What do you think @rustyrussell, should we just increase for now until we have a good solution?

SimonVrouwe commented 6 years ago

Actually, the error message in the log says received ERROR Peer permanent failure in CHANNELD_NORMAL: lightning_channeld: received ERROR channel which means that it was the remote peer that did not agree with theupdate_fee message send by my node. Which makes more sense as it was my node that initiated/funded all 6 channels and was responsible for the fee.

BOLT#02

The node not responsible for paying the Bitcoin fee:

MUST NOT send update_fee.

So it was then my bitcoind that was so fast that all other 6 peers couldn't keep up with the new fee estimate. I assume that the last part of the ERROR message, i.e. update_fee 3047 outside range 253-2820 was part of the error message send by the remote.

Would you care @cdecker to share the source/code of that min/max fee ratio chart? And isn't the estimatesmartfee polled every 30s, which could trigger an update_fee message, instead of every hour?

cdecker commented 6 years ago

Sure thing, here it is https://gist.github.com/cdecker/39186d8c041924806773f4bacd0aefdd

I used the 1 hour range mainly to exacerbate the problem a bit (and because statoshi only has 10 minute snapshots). We could probably smoothen out the effect by using an exponential moving average, so that we aren't as jittery when we don't have a block for some time.

cdecker commented 6 years ago

As for the remote end sending those errors, that seems strange since the error message originates from c-lightning's code:

https://github.com/ElementsProject/lightning/blob/56046d470b24d52c959cc2c558040720c907fd49/channeld/channel.c#L651-L655

So either I misattributed your peers to be lnd, or the error message was reflected back to us.

SimonVrouwe commented 6 years ago

Well at least one of the remotes claims to be a c-lightning 02977901c53b5299c7641acf11d49a46f0957e9e3f8191e6171d1222486d872317

cdecker commented 6 years ago

Great, that settles the problem I had figuring out where it comes from :-)

We just discussed that we'll raise our acceptable fee range and then reduce our sensitivity by using an exponential moving average to flatten the fee transitions.

Roasbeef commented 6 years ago

Our default color is "#3399FF. Atm, we sample fees each block, and if fees move 10% or more, then we'll update. We're moving soon to a system which decouples synchronous updates across all channels, and instead we'll allow each node to randomly sample themselves.

SimonVrouwe commented 6 years ago

Perhaps it would be beneficial if users could (optionally) set the acceptable fee range. For example by using --commit-fee-min=<percent> and --commit-fee-max=<percent> to calculate the range.

cdecker commented 6 years ago

For example by using --commit-fee-min= and --commit-fee-max= to calculate the range.

On what would that percentage be computed? Total value in the channel or the funder's value?

SimonVrouwe commented 6 years ago

Well from the help message I guessed/assumed it was as a percentage of the fee (estimate), similar to --commit-fee=<percent> Percentage of fee to request for their commitment (default: 500)

Are you referring to a situation where a party cannot afford the fee rate? What happens then?

Somewhat off-topic, but I was also thinking about the (hypothetical) situation where a sharp rise in fee rate could cause a cascade of channel closures. The massive broadcasting of high fee-paying closing tx's would further increase fee rates, causing more channel to close etc.

ElementsProject / lightning