Closed vincenzopalazzo closed 1 year ago
Ok looks like that with wrong (or too big estimation fee 2023-04-16T10:58:23.544Z DEBUG plugin-folgore_plugin: fee estimated from nakamoto {6: 1240000000, 12: 1350000000, 100: 3610000000, 2: 1270000000}
) we some loop in cln
I cannot reproduce this. See #6204
We just hit this in production and our node is unusable. @vincenzopalazzo is the solution just to upgrade to the latest version?
Maybe I never trigger the bug again, but if this happens means that bcli plugin is going crazy.
I see also you PR, did you solve the problem of upgrading to the new bitcoin version?
We're still getting infrastructure updated across the board and will find out soon. Though we aren't doing the same thing as you were by running a custom backend that was providing fee rates, so I highly doubt this will fix our issue and there's some other critical bug with the fundchannel
call. I'll try to get a stack report when we get to that point.
I see keep me updated so we can debug it
@vincenzopalazzo It's still happening with 23.08:
root@ip-10-20-28-53:/opt/pysetup# tail -f /mnt/shared/cln.log
2023-09-08T14:39:56.052Z DEBUG 0313d313d45ce0b10c406a2d249d7643865451aa8e783923909cb794dbc0b083f0-onchaind-chan#1601: billboard: 1 outputs unresolved: waiting confirmation that we spent DELAYED_OUTPUT_TO_US (ac9f8e209aed564dc4808a2395303a15932ef70cd93aa71443552e5602f30259:0) using OUR_DELAYED_RETURN_TO_WALLET
2023-09-08T14:39:56.526Z DEBUG plugin-spenderp: mfc 34: multiconnect.
2023-09-08T14:39:56.526Z DEBUG plugin-spenderp: mfc 34, dest 0: connect 03789e3087822b0e5d94c6e5f452131a174d4da580c6a355d9434be6e8f7469438.
2023-09-08T14:39:56.527Z DEBUG lightningd: Already connected via 127.0.0.1:40056
2023-09-08T14:39:56.653Z DEBUG 0313d313d45ce0b10c406a2d249d7643865451aa8e783923909cb794dbc0b083f0-onchaind-chan#1601: Got new message WIRE_ONCHAIND_DEPTH
2023-09-08T14:39:56.771Z DEBUG plugin-spenderp: mfc 34, dest 0: connect done.
2023-09-08T14:39:56.771Z DEBUG plugin-spenderp: mfc 34: multiconnect done.
2023-09-08T14:39:56.922Z DEBUG 0313d313d45ce0b10c406a2d249d7643865451aa8e783923909cb794dbc0b083f0-onchaind-chan#1601: FUNDING_TRANSACTION/FUNDING_OUTPUT->OUR_UNILATERAL depth 2848
2023-09-08T14:39:56.922Z DEBUG plugin-spenderp: mfc 34: 'parsefeerate' done
2023-09-08T14:39:56.922Z DEBUG plugin-spenderp: mfc 34: fundpsbt.
2023-09-08T14:45:07.109Z **BROKEN** lightningd: FATAL SIGNAL 11 (version v23.08)
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: common/daemon.c:38 (send_backtrace) 0x56446df10545
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: common/daemon.c:75 (crashdump) 0x56446df106cb
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fae257e513f
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:320 (init_property) 0x56446e14abe7
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:339 (add_notifier_property) 0x56446e14ac5d
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:584 (tal_add_notifier_) 0x56446e14b51d
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: common/daemon.c:141 (add_steal_notifier) 0x56446df10926
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:248 (notify) 0x56446e14a9e8
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:477 (tal_alloc_) 0x56446e14b13a
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:506 (tal_alloc_arr_) 0x56446e14b237
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: db/bindings.c:529 (db_col_arr_) 0x56446df2e8e4
2023-09-08T14:45:07.109Z **BROKEN** lightningd: backtrace: wallet/wallet.c:253 (wallet_stmt2output) 0x56446deebc9a
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: wallet/wallet.c:596 (wallet_find_utxo) 0x56446deec669
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: wallet/reservation.c:547 (json_fundpsbt) 0x56446deff31f
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:658 (command_exec) 0x56446de97d6a
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:786 (rpc_command_hook_final) 0x56446de98359
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: lightningd/plugin_hook.c:285 (plugin_hook_call_) 0x56446ded6314
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:874 (plugin_hook_call_rpc_command) 0x56446de98752
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:984 (parse_request) 0x56446de98ced
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: lightningd/jsonrpc.c:1090 (read_json) 0x56446de991ae
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x56446e138efd
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x56446e139aa7
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x56446e139ae5
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x56446e13bcfb
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: lightningd/io_loop_with_timers.c:22 (io_loop_with_timers) 0x56446de96060
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:1332 (main) 0x56446de9cc3e
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7fae255b4d09
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x56446de67419
2023-09-08T14:45:07.110Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0xffffffffffffffff
Further, CLI commands for everything, including getinfo
take longer than a minute with this new release.
We are going to do a chain resync now, but any guidance on how to proceed?
For ease, here's some of the code for each of the parts in the stack trace:
And then it goes into tal_alloc_arr_
.
This has not been a problem all year and just surfaced yesterday. There's maybe 1k+ channels it has had. When the lock up happens, it behaves exactly like described in the OP. It eats up RAM slowly until it runs out.
I need to look more at the stacktrace, I will do in a couple of hours, but this is a crash caused by a OOM? or just a a crash?
I am thinking that this can be a bug introduced in this release?
@vincenzopalazzo There is no real 'crash'. CLN just completely freezes up and stops responding. Then its memory consumption grows until there is some OOM kill situation.
This was happening on v23.05.1 and v23.08
Thanks to confirm so this is the same bug, putting in the point release queue lets see if we can debug it thanks
I've reproduced it and got a fix at https://github.com/ElementsProject/lightning/pull/6657
The temp solution I think is to fund the node with more higher value UTXOs, but we are still doing mantenance on our node so we have not verified it yet.
While testing a new backend plugin for core lightning, and testing it I get stuck with the
fundchannel
command.Originally I thought that this was a problem from my plugin side https://github.com/coffee-tools/folgore/issues/21 but then I make a stack report from core lightning with
kill -SEGV <PID>
and then I get the following reportand core lightning is stuck with the following command
Putting this there because the stacktrace in
wallet_find_utxo
sounds worrying to me!