ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.86k stars 905 forks source link

cln v24.08 crash #7689

Closed JssDWt closed 1 month ago

JssDWt commented 1 month ago
lightning_gossipd: gossip_store: get delete entry offset 5098973/19323010780 (version v24.08-4-gbc9e4f5-modded)
0x556cf571e570 send_backtrace
        common/daemon.c:33
0x556cf5727c3f status_failed
        common/status.c:221
0x556cf5715ca5 gossip_store_get_with_hdr
        gossipd/gossip_store.c:466
0x556cf57161f7 gossip_store_set_timestamp
        gossipd/gossip_store.c:592
0x556cf571783e process_channel_update
        gossipd/gossmap_manage.c:777
0x556cf5718190 gossmap_manage_channel_update
        gossipd/gossmap_manage.c:901
0x556cf5714a5a handle_recv_gossip
        gossipd/gossipd.c:215
0x556cf5714b45 connectd_req
        gossipd/gossipd.c:307
0x556cf571e85b handle_read
        common/daemon_conn.c:35
0x556cf586cd8c next_plan
        ccan/ccan/io/io.c:60
0x556cf586d217 do_plan
        ccan/ccan/io/io.c:422
0x556cf586d2d0 io_ready
        ccan/ccan/io/io.c:439
0x556cf586ebbc io_loop
        ccan/ccan/io/poll.c:455
0x556cf5714e0c main
        gossipd/gossipd.c:672
0x7f719eee1c89 ???
        ???:0
0x7f719eee1d44 ???
        ???:0
0x556cf5711ae0 ???
        ???:0
0xffffffffffffffff ???
        ???:0

The crash was observed on this branch: https://github.com/breez/lightning/tree/cln-v24.08-breez with commit https://github.com/breez/lightning/commit/bc9e4f56c324216f5f0f15be07f6ad4f9a46e597

The branch contains changes compared to v24.08, namely

https://github.com/ElementsProject/lightning/pull/7628 https://github.com/ElementsProject/lightning/pull/7611 https://github.com/ElementsProject/lightning/pull/7636 But I don't think they were related to the crash.

Notable thing: The gossip store file was 18GB

ShahanaFarooqui commented 1 month ago

Another gossip crash report from v24.08.1: https://github.com/ElementsProject/lightning/pull/7685#issuecomment-2379521485

ShahanaFarooqui commented 1 month ago

Reported on Telegram by +steepdawn974:

...
2024-10-02T08:58:33.820Z INFO    plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
2024-10-02T08:58:43.407Z **BROKEN** gossipd: gossip_store: checksum verification failed? 32536bf2 should be 67132a62 (offset 3972). Moving to gossip_store.corrupt and truncating
2024-10-02T08:58:43.407Z UNUSUAL 025651f2193a89a44a80d833f0a82da668a3af8438eff2e9633fabb3f6a3748be6-chan#15523: gossipd lost track of announced channel: re-announcing!
2024-10-02T08:58:43.408Z UNUSUAL 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018-chan#15522: gossipd lost track of announced channel: re-announcing!
2024-10-02T08:58:43.408Z UNUSUAL 024a8228d764091fce2ed67e1a7404f83e38ea3c7cb42030a2789e73cf3b341365-chan#15524: gossipd lost track of announced channel: re-announcing!
2024-10-02T08:58:43.464Z INFO    plugin-clnrest: REST server running at https://127.0.0.1:3010
2024-10-02T08:58:43.548Z INFO    lightningd: --------------------------------------------------
2024-10-02T08:58:43.548Z INFO    lightningd: Server started with public key xxxxx, alias xxxxx (color #0362df) and lightningd v24.08
2024-10-02T08:59:37.638Z UNUSUAL lightningd: Bad gossip order: could not find channel 9999999x475x0 for peer's channel update
2024-10-02T09:02:32.335Z **BROKEN** gossipd: Dying channel 863308x1674x0 already deleted?
2024-10-02T09:02:32.335Z **BROKEN** gossipd: gossip_store: bad checksum offset 451:  (version v24.08)
2024-10-02T09:02:32.335Z **BROKEN** gossipd: backtrace: common/daemon.c:38 (send_backtrace) 0x55793fd3051b
2024-10-02T09:02:32.335Z **BROKEN** gossipd: backtrace: common/status.c:221 (status_failed) 0x55793fd39bac
2024-10-02T09:02:32.335Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:480 (gossip_store_get_with_hdr) 0x55793fd27d90
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:491 (check_msg_type) 0x55793fd27dbe
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:509 (gossip_store_set_flag) 0x55793fd27f41
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossip_store.c:561 (gossip_store_del) 0x55793fd28187
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossmap_manage.c:1216 (gossmap_manage_new_block) 0x55793fd2a82f
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:477 (new_blockheight) 0x55793fd260ff
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:588 (recv_req) 0x55793fd26529
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: common/daemon_conn.c:35 (handle_read) 0x55793fd307c6
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:60 (next_plan) 0x55793fdc0056
2024-10-02T09:02:32.336Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:422 (do_plan) 0x55793fdc04e1
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: ccan/ccan/io/io.c:439 (io_ready) 0x55793fdc059a
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: ccan/ccan/io/poll.c:455 (io_loop) 0x55793fdc1ee7
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: gossipd/gossipd.c:672 (main) 0x55793fd26ead
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x7f4cb5aa1d09
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0x55793fd23d29
2024-10-02T09:02:32.337Z **BROKEN** gossipd: backtrace: (null):0 ((null)) 0xffffffffffffffff
2024-10-02T09:02:32.337Z **BROKEN** gossipd: STATUS_FAIL_INTERNAL_ERROR: gossip_store: bad checksum offset 451:
steepdawn974 commented 1 month ago

https://github.com/ElementsProject/lightning/issues/7689#issuecomment-2389187161

This was on v24.08

rustyrussell commented 1 month ago

#7689 (comment)

This was on v24.08

Wow, this is completely broken. Is this some weird OS? You seem to be getting bad checksums all the time...

Also, please can you send me gossip_store.corrupt?

rustyrussell commented 1 month ago

We use 32-bit file offsets, but since we stopped filtering gossip spam, the store can grow much larger. I suspect this is causing all kinds of weirdness.

The workaround is to restart (which compacts the gossip store), but I'll simply switch to 64 bit offsets for the point release.