ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.83k stars 896 forks source link

cln restart loop caused by fee overflow #6716

Open daywalker90 opened 1 year ago

daywalker90 commented 1 year ago

My mainnet node is in a restart loop because of this:

Sep 24 11:20:26  lightningd[2028800]: DEBUG   0380ef0209ff1b46c38a37cd40f613d1dae3eba481a909459d6c1434a0e56e5d8c-hsmd: Got WIRE_HSMD_CUPDATE_SIG_REQ
Sep 24 11:20:26  lightningd[2028800]: DEBUG   hsmd: Client: Received message 3 from client
Sep 24 11:20:26  lightningd[2028800]: DEBUG   0380ef0209ff1b46c38a37cd40f613d1dae3eba481a909459d6c1434a0e56e5d8c-hsmd: Got WIRE_HSMD_CUPDATE_SIG_REQ
Sep 24 11:20:26  lightningd[2028800]: DEBUG   hsmd: Client: Received message 3 from client
Sep 24 11:20:26  lightningd[2028800]: DEBUG   0380ef0209ff1b46c38a37cd40f613d1dae3eba481a909459d6c1434a0e56e5d8c-hsmd: Got WIRE_HSMD_CUPDATE_SIG_REQ
Sep 24 11:20:26  lightningd[2028800]: DEBUG   hsmd: Client: Received message 3 from client
Sep 24 11:20:26  lightningd[2028800]: DEBUG   0380ef0209ff1b46c38a37cd40f613d1dae3eba481a909459d6c1434a0e56e5d8c-hsmd: Got WIRE_HSMD_CUPDATE_SIG_REQ
Sep 24 11:20:26  lightningd[2028800]: DEBUG   hsmd: Client: Received message 3 from client
Sep 24 11:20:26  lightningd[2028800]: DEBUG   0380ef0209ff1b46c38a37cd40f613d1dae3eba481a909459d6c1434a0e56e5d8c-hsmd: Got WIRE_HSMD_CUPDATE_SIG_REQ
Sep 24 11:20:26  lightningd[2028800]: DEBUG   hsmd: Client: Received message 3 from client
Sep 24 11:20:26  lightningd[2028800]: DEBUG   lightningd: io_break: gossipd_init_done
Sep 24 11:20:26  lightningd[2028800]: DEBUG   lightningd: io_loop: gossip_init
Sep 24 11:20:26  lightningd[2028800]: DEBUG   lightningd: Looking for [autoclean,succeededforwards,num]
Sep 24 11:20:26  lightningd[2028800]: DEBUG   lightningd: Got [autoclean,succeededforwards,num]
Sep 24 11:20:26  lightningd[2028800]: DEBUG   lightningd: Printing
Sep 24 11:20:26  lightningd[2028800]: **BROKEN** lightningd: Adding forward fees 516150999msat + 18446744073193896594msat overflowed
Sep 24 11:20:26  lightningd[2028800]: Adding forward fees 516150999msat + 18446744073193896594msat overflowed
Sep 24 11:20:26  systemd[1]: lightningd.service: Main process exited, code=exited, status=1/FAILURE
Sep 24 11:20:26  lightningd[2028808]: Lost connection to the RPC socket.
Sep 24 11:20:26  lightningd[2028807]: Reading JSON input: Connection reset by peer
Sep 24 11:20:26  lightningd[2028809]: Reading JSON input: Connection reset by peer
Sep 24 11:20:26  lightningd[2028811]: Reading JSON input: Connection reset by peer
Sep 24 11:20:26  lightningd[2028814]: Reading JSON input: Connection reset by peer
Sep 24 11:20:26  lightningd[2028810]: Reading JSON input: Connection reset by peer
Sep 24 11:20:26  systemd[1]: lightningd.service: Failed with result 'exit-code'.

running cln 23.08.1, debian 12

daywalker90 commented 1 year ago

to be clear, the restarting is obviously done by systemd, but the node cannot be started until this is fixed :/

daywalker90 commented 1 year ago

I asked chatgpt to help me undestand the c code and i think the related code is here: https://github.com/ElementsProject/lightning/blob/master/wallet/wallet.c#L4658 and with my earlier issue here: https://github.com/ElementsProject/lightning/issues/6260 i think this is what happened: I had a fake? forward with an enormous fee (i remember my fees_collected starting with 18 and being higher that the 21m btc limit, so maybe it was close to the max of u64) and now the deleted forward fees went over the max of u64. I then ran: SELECT CAST(COALESCE(SUM(in_msatoshi - out_msatoshi), 0) AS BIGINT) FROM forwards WHERE state = 1; which returned:

 coalesce
-----------
 516150999
(1 row)

and: SELECT intval FROM vars WHERE name = 'deleted_forward_fees' LIMIT 1; which returned:

   intval
------------
 -515655022
(1 row)

Now to be practical and get my node back up can i run: UPDATE vars SET intval = 0 WHERE name = 'deleted_forward_fees'; ?

daywalker90 commented 1 year ago

I went ahead and ran that command to set deleted_forward_fees back to 0. Node is back up :)