ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.86k stars 905 forks source link

gossmap: ensure chan not null #7685

Closed JssDWt closed 1 month ago

JssDWt commented 1 month ago

Ignore localmods that don't have a corresponding entry in the gossmap.

A crash was observed on this branch: https://github.com/breez/lightning/tree/cln-v24.08-breez with commit https://github.com/breez/lightning/commit/bc9e4f56c324216f5f0f15be07f6ad4f9a46e597

pay: FATAL SIGNAL 11 (version v24.08-4-gbc9e4f5-modded)
0x5584c2da9cbf send_backtrace
        common/daemon.c:33
0x5584c2da9d44 crashdump
        common/daemon.c:75
0x7fc69664858f ???
        ???:0
0x5584c2dc2864 gossmap_remove_localmods
        common/gossmap.c:984
0x5584c2d94b2f put_gossmap
        plugins/libplugin-pay.c:62
0x5584c2d9ac32 routehint_step_cb
        plugins/libplugin-pay.c:3171
0x5584c2d98fda payment_continue
        plugins/libplugin-pay.c:2450
0x5584c2d99928 shadow_route_cb
        plugins/libplugin-pay.c:3529
0x5584c2d98fda payment_continue
        plugins/libplugin-pay.c:2450
0x5584c2d9b585 direct_pay_override
        plugins/libplugin-pay.c:3550
0x5584c2d9b7a8 direct_pay_listpeerchannels
        plugins/libplugin-pay.c:3621
0x5584c2d93713 handle_rpc_reply
        plugins/libplugin.c:1016
0x5584c2d938b7 rpc_read_response_one
        plugins/libplugin.c:1202
0x5584c2d93964 rpc_conn_read_response
        plugins/libplugin.c:1226
0x5584c2ef37cc next_plan
        ccan/ccan/io/io.c:60
0x5584c2ef3c57 do_plan
        ccan/ccan/io/io.c:422
0x5584c2ef3d10 io_ready
        ccan/ccan/io/io.c:439
0x5584c2ef55fc io_loop
        ccan/ccan/io/poll.c:455
0x5584c2d94006 plugin_main
        plugins/libplugin.c:2230
0x5584c2d8f029 main
        plugins/pay.c:1533
0x7fc696632c89 ???
        ???:0
0x7fc696632d44 ???
        ???:0
0x5584c2d8b7b0 ???
        ???:0
0xffffffffffffffff ???
        ???:0

The branch contains changes compared to v24.08, namely

But I don't think they were related to the crash. A simple null check should suffice here?

Checklist

Before submitting the PR, ensure the following tasks are completed. If an item is not applicable to your PR, please mark it as checked:

TRIGEMTECH commented 1 month ago

Encountered similar issue after overnight power surge. Is it safe to delete gossip_store?

Was able to get lightning started but appears stuck in infinite loop with following two lines repeating for one particular channel at the same offsets. lightning_gossipd: gossmap: redundant channel_announce for ...! lightning_connectd: gossmap: redundant channel_announce for ...! Eventually the looping stops but resumes after several minutes.

The output occurs on line 471 in gossmap.c warnx("gossmap: redundant channel_announce for %s, offsets %u and %zu!",

Added following lines as shown in single changed file and recompiled - no change. 982 if (chan == NULL) 983 continue;

---FINAL UPDATE--- Able to resolve my issue by reading a lot and by:

  1. shutting every thing down and rebooting system
  2. deleting gossip_store and gossip_store.corrupt
  3. restarting lightningd gossip_store has been re-created and based on its previous file size it appears it will take several hours to complete. The answer to the question 'Is it safe to delete gossip_store?' is yes. ---NO FURTHER ACTION REQUIRED---

Terminal output after original attempt at restarting lightningd shown below

lightning_gossipd: gossip_store: get delete entry offset 34356327/14528 (version v24.08.1-modded) 0x58da2a2455f3 send_backtrace common/daemon.c:33 0x58da2a24f026 status_failed common/status.c:221 0x58da2a23cb8b gossip_store_get_with_hdr gossipd/gossip_store.c:466 0x58da2a23cc06 check_msg_type gossipd/gossip_store.c:491 0x58da2a23cd99 gossip_store_set_flag gossipd/gossip_store.c:509 0x58da2a23cfac gossip_store_del gossipd/gossip_store.c:561 0x58da2a23e788 process_channel_update gossipd/gossmap_manage.c:793 0x58da2a23f0a0 gossmap_manage_channel_update gossipd/gossmap_manage.c:901 0x58da2a23b923 handle_recv_gossip gossipd/gossipd.c:215 0x58da2a23ba12 connectd_req gossipd/gossipd.c:307 0x58da2a245909 handle_read common/daemon_conn.c:35 0x58da2a385a49 next_plan ccan/ccan/io/io.c:60 0x58da2a385f1a do_plan ccan/ccan/io/io.c:422 0x58da2a385fd7 io_ready ccan/ccan/io/io.c:439 0x58da2a387949 io_loop ccan/ccan/io/poll.c:455 0x58da2a23bcf1 main gossipd/gossipd.c:672 0x704350e2a1c9 libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 0x704350e2a28a libc_start_main_impl ../csu/libc-start.c:360 0x58da2a238874 ??? ???:0 0xffffffffffffffff ??? ???:0 2024-09-27T02:20:05.267Z BROKEN connectd: STATUS_FAIL_GOSSIP_IO: gossipd exited? lightningd: connectd failed (exit status 242), exiting. Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.

rustyrussell commented 1 month ago

Ack!

This is the same as bc1aabb01452cf612c18c4666add464802dfb1f5 which is already in master: @ShahanaFarooqui might want to cherry-pick that for the branch instead?

cdecker commented 1 month ago

This was already deployed as part of https://github.com/ElementsProject/lightning/pull/7707