ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.83k stars 899 forks source link

crash: more gossip troubles on master #2750

Closed m-schmoock closed 5 years ago

m-schmoock commented 5 years ago

Issue and Steps to Reproduce

On latest master (5e3690b3) mainnet, I encountered another gossip related crash after startup. I don't know exactly how it was triggered, by maybe related reading gossip (i.e. by invoking summary plugin) directly after daemon startup.

Log and Stacktrace

gossip_store channel_announce 20421292 replaces 3967960!
-------------------------------------------
started with public key 02a2c53bc475cb92e4ab2f38a5bca56df695034ce90ad78c2f47c05911e3f79e41, alias schmoock.net (color #00ddff) and lightningd v0.7.1rc2-6-g5e3690b
gossip_store: Bad channel_announcement (10050000000000fd4bc0) truncating to 20421736
Deleting un-amounted channel_announcement @20421292
Deleted 172 unfinalized node_announcements
Deleted 7855 unupdated channel_announcements
Plugin rebalance.py initialized
Plugin summary.py initialized
30507): We seem to be missing gossip messages
Plugin sendinvoiceless.py initialized
lightning_channeld-0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer connection lost
0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
lightning_channeld-03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer connection lost
03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
lightning_channeld-02d776ab8db0f62572e0435a70a5f3f3b2ee803357cbef9f915720a9d377a71c6b chan #2283: Peer connection lost
02d776ab8db0f62572e0435a70a5f3f3b2ee803357cbef9f915720a9d377a71c6b chan #2283: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
lightning_channeld-02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c chan #213: Peer connection lost
02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c chan #213: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: Peer connection lost
03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
FATAL SIGNAL 6 (version v0.7.1rc2-6-g5e3690b)
backtrace: common/daemon.c:45 (send_backtrace) 0x55aadd369ac1
backtrace: common/daemon.c:53 (crashdump) 0x55aadd369b11
backtrace: (null):0 ((null)) 0x7fd0232ab8af
backtrace: (null):0 ((null)) 0x7fd0232ab82f
backtrace: (null):0 ((null)) 0x7fd023296671
backtrace: (null):0 ((null)) 0x7fd023296547
backtrace: (null):0 ((null)) 0x7fd0232a3db5
backtrace: gossipd/gossipd.c:489 (get_node_announcement) 0x55aadd360e80
backtrace: gossipd/gossipd.c:509 (get_node_announcement_by_id) 0x55aadd360f46
backtrace: gossipd/gossipd.c:522 (node_announcement_redundant) 0x55aadd360f96
backtrace: gossipd/gossipd.c:561 (maybe_send_own_node_announce) 0x55aadd36119c
backtrace: gossipd/gossipd.c:697 (handle_channel_update_msg) 0x55aadd36159c
backtrace: gossipd/gossipd.c:1627 (peer_msg_in) 0x55aadd363896
backtrace: common/daemon_conn.c:31 (handle_read) 0x55aadd369f75
backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x55aadd38e3c1
backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x55aadd38ef3e
backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x55aadd38ef7c
backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x55aadd390fd1
backtrace: gossipd/gossipd.c:3063 (main) 0x55aadd366e9f
backtrace: (null):0 ((null)) 0x7fd023297ce2
backtrace: (null):0 ((null)) 0x55aadd352e3d
backtrace: (null):0 ((null)) 0xffffffffffffffff
FATAL SIGNAL (version v0.7.1rc2-6-g5e3690b)
backtrace: common/daemon.c:45 (send_backtrace) 0x55aadd369ac1
backtrace: common/status.c:192 (status_failed) 0x55aadd36f277
backtrace: common/subdaemon.c:25 (status_backtrace_exit) 0x55aadd36f4bf
backtrace: common/daemon.c:56 (crashdump) 0x55aadd369b1a
backtrace: (null):0 ((null)) 0x7fd0232ab8af
backtrace: (null):0 ((null)) 0x7fd0232ab82f
backtrace: (null):0 ((null)) 0x7fd023296671
backtrace: (null):0 ((null)) 0x7fd023296547
backtrace: (null):0 ((null)) 0x7fd0232a3db5
backtrace: gossipd/gossipd.c:489 (get_node_announcement) 0x55aadd360e80
backtrace: gossipd/gossipd.c:509 (get_node_announcement_by_id) 0x55aadd360f46
backtrace: gossipd/gossipd.c:522 (node_announcement_redundant) 0x55aadd360f96
backtrace: gossipd/gossipd.c:561 (maybe_send_own_node_announce) 0x55aadd36119c
backtrace: gossipd/gossipd.c:697 (handle_channel_update_msg) 0x55aadd36159c
backtrace: gossipd/gossipd.c:1627 (peer_msg_in) 0x55aadd363896
backtrace: common/daemon_conn.c:31 (handle_read) 0x55aadd369f75
backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x55aadd38e3c1
backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x55aadd38ef3e
backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x55aadd38ef7c
backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x55aadd390fd1
backtrace: gossipd/gossipd.c:3063 (main) 0x55aadd366e9f
m-schmoock commented 5 years ago

Hey @rustyrussell I also got the following during runtime after a couple of hours. If required I can attach the gossip store for this after the crash happened. Sorry, but reporting on these issues is currently all I can do :/

2019-06-16T11:33:45.200Z lightning_gossipd(4386): gossip_store: Entry at 24928329->20291383 not updated? (version v0.7.1rc2-6-g5e3690b)
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: common/daemon.c:45 (send_backtrace) 0x563d56824ac1
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: common/status.c:192 (status_failed) 0x563d5682a277
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: gossipd/gossip_store.c:310 (gossip_store_compact) 0x563d56812bc8
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: gossipd/gossip_store.c:364 (gossip_store_maybe_compact) 0x563d56812dca
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: gossipd/gossip_store.c:463 (gossip_store_delete) 0x563d568131af
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: gossipd/routing.c:1886 (routing_add_channel_update) 0x563d56818abc
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: gossipd/routing.c:2077 (handle_channel_update) 0x563d56819190
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: gossipd/gossipd.c:684 (handle_channel_update_msg) 0x563d5681c557
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: gossipd/gossipd.c:1627 (peer_msg_in) 0x563d5681e896
2019-06-16T11:33:45.200Z lightning_gossipd(4386): backtrace: common/daemon_conn.c:31 (handle_read) 0x563d56824f75
2019-06-16T11:33:45.201Z lightning_gossipd(4386): backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x563d568493c1
2019-06-16T11:33:45.201Z lightning_gossipd(4386): backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x563d56849f3e
2019-06-16T11:33:45.201Z lightning_gossipd(4386): backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x563d56849f7c
2019-06-16T11:33:45.201Z lightning_gossipd(4386): backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x563d5684bfd1
2019-06-16T11:33:45.201Z lightning_gossipd(4386): backtrace: gossipd/gossipd.c:3063 (main) 0x563d56821e9f
2019-06-16T11:33:45.201Z lightning_gossipd(4386): backtrace: (null):0 ((null)) 0x7f28bfa5cce2
2019-06-16T11:33:45.201Z lightning_gossipd(4386): backtrace: (null):0 ((null)) 0x563d5680de3d
2019-06-16T11:33:45.201Z lightning_gossipd(4386): backtrace: (null):0 ((null)) 0xffffffffffffffff
2019-06-16T11:33:45.201Z lightning_gossipd(4386): STATUS_FAIL_INTERNAL_ERROR: gossip_store: Entry at 24928329->20291383 not updated?
2019-06-16T11:33:45.201Z lightning_gossipd(4386): Status closed, but waitpid 4386 says No child processes
m-schmoock commented 5 years ago

And this one directly after trying to restart:

2019-06-16T12:13:58.264Z lightningd(23775): --------------------------------------------------
2019-06-16T12:13:58.264Z lightningd(23775): Server started with public key 02a2c53bc475cb92e4ab2f38a5bca56df695034ce90ad78c2f47c05911e3f79e41, alias schmoock.net (color #00ddff) and lightningd v0.7.1rc2-6-g5e3690b
2019-06-16T12:13:58.266Z plugin-rebalance.py Plugin rebalance.py initialized
2019-06-16T12:13:58.266Z plugin-summary.py Plugin summary.py initialized
2019-06-16T12:13:58.266Z plugin-sendinvoiceless.py Plugin sendinvoiceless.py initialized
2019-06-16T12:14:04.909Z lightning_gossipd(23791): gossip_store: bad checksum offset 26157601: 01c9c3800000008a07fd3f7e5d05d0bf01020a24b07536684e62c0c51b1a35f91df762e5c7a014c077b408de4867a3e6b91a7badc68e875b7e8f4ace65e7e4c94d4d72e529efdf0804e24da0e25e6c04ffb46fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d619000000000008ab100006e500005d05d0bf0101001e00000000000003e800000000000007080000000005e69ec00000008a38d8afcd5d05d0b8010258f225e19ecf236a593669b55c68daee80a0ed828b684135c65abe2dd255f2d67297920b613864571e16c1f87a3bf78d4bf6a597f3ea1cc7eab094c02f5810d26fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d619000000000008953800093700005d05d0b80103009000000000000003e8000003e8000000010000000005d21cc00000008a4023760a5d05d0b80102cd64e8989f1652178a9b55b09d859cda90513e7835812c47185171fb5e12e2fb42df979f1b90ed376e33acc8bd30ee7a13999654ce12bf7f6cd7b6d2820b7b366fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000896cc0004d200005d05d0b8010000900000000000000001000003e800000001000000009fc5cf480000008ab0f39f0b5d05d0b80102cb170b2e016df02cc0a226504befea25ddc63f89680682d298be90ea78bd5f252b277b02d684849311d34bffd74b1a618b386689af87e751aa625022c75022516fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000089a770003fc00015d05d0b80100009000000000000003e8000003e80000000100000003d77a05000000008a711910925d05d0be01027c783d6038d84fcf49869730d264f5471f12f4b595c469a11e58d088a6f34fe87b3620b901d09e43cd826b88af48f087618b233754e85d4c5a496229dcf110186fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d619000000000008a7b900052a00005d05d0be0101000400000000000003e8000000000000000100000003ddfffcb80000008a55b317c15d05d0ba0102903d1f3a3039aa84e1d183417d6e0a9b7cf2fc72ef97961173d88b0412e9a8ca556e1454d7b80467c2a510eea4e56939fb3f44ae383e00448dfd64cc276b6d2c6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d619000000000008bf4d00012800015d05d0ba0101009000000000000003e8000003e80000000a000000001d8119c00000008a87507dec5d05d0ba01020e6e6c4e76a8bbff0fd71f843151e12daf9be15faf6a209cf111d4d5df4929805dd072ada8e66264a58b0f920f14b87929389be581b18d635de397a97d44dfac6fe28c0ab6f1b372c1a6a246ae63f74f931e (version v0.7.1rc2-6-g5e3690b)
2019-06-16T12:14:04.909Z lightning_gossipd(23791): backtrace: common/daemon.c:45 (send_backtrace) 0x5582fddcdac1
2019-06-16T12:14:04.909Z lightning_gossipd(23791): backtrace: common/status.c:192 (status_failed) 0x5582fddd3277
2019-06-16T12:14:04.909Z lightning_gossipd(23791): backtrace: gossipd/gossip_store.c:495 (gossip_store_get) 0x5582fddbc346
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: gossipd/gossip_store.c:411 (delete_by_index) 0x5582fddbbfa5
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: gossipd/gossip_store.c:453 (gossip_store_delete) 0x5582fddbc177
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: gossipd/routing.c:1968 (remove_channel_from_store) 0x5582fddc1d4a
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: gossipd/gossipd.c:2892 (handle_outpoint_spent) 0x5582fddca9ea
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: gossipd/gossipd.c:2957 (recv_req) 0x5582fddcabde
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: common/daemon_conn.c:31 (handle_read) 0x5582fddcdf75
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x5582fddf23c1
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x5582fddf2f3e
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x5582fddf2f7c
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x5582fddf4fd1
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: gossipd/gossipd.c:3063 (main) 0x5582fddcae9f
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: (null):0 ((null)) 0x7f7d14aeace2
2019-06-16T12:14:04.910Z lightning_gossipd(23791): backtrace: (null):0 ((null)) 0x5582fddb6e3d
2019-06-16T12:14:04.911Z lightning_gossipd(23791): backtrace: (null):0 ((null)) 0xffffffffffffffff
2019-06-16T12:14:04.911Z lightning_gossipd(23791): STATUS_FAIL_INTERNAL_ERROR: gossip_store: bad checksum offset 26157601: 01c9c3800000008a07fd3f7e5d05d0bf01020a24b07536684e62c0c51b1a35f91df762e5c7a014c077b408de4867a3e6b91a7badc68e875b7e8f4ace65e7e4c94d4d72e529efdf0804e24da0e25e6c04ffb46fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d619000000000008ab100006e500005d05d0bf0101001e00000000000003e800000000000007080000000005e69ec00000008a38d8afcd5d05d0b8010258f225e19ecf236a593669b55c68daee80a0ed828b684135c65abe2dd255f2d67297920b613864571e16c1f87a3bf78d4bf6a597f3ea1cc7eab094c02f5810d26fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d619000000000008953800093700005d05d0b80103009000000000000003e8000003e8000000010000000005d21cc00000008a4023760a5d05d0b80102cd64e8989f1652178a9b55b09d859cda90513e7835812c47185171fb5e12e2fb42df979f1b90ed376e33acc8bd30ee7a13999654ce12bf7f6cd7b6d2820b7b366fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d61900000000000896cc0004d200005d05d0b8010000900000000000000001000003e800000001000000009fc5cf480000008ab0f39f0b5d05d0b80102cb170b2e016df02cc0a226504befea25ddc63f89680682d298be90ea78bd5f252b277b02d684849311d34bffd74b1a618b386689af87e751aa625022c75022516fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000089a770003fc00015d05d0b80100009000000000000003e8000003e80000000100000003d77a05000000008a711910925d05d0be01027c783d6038d84fcf49869730d264f5471f12f4b595c469a11e58d088a6f34fe87b3620b901d09e43cd826b88af48f087618b233754e85d4c5a496229dcf110186fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d619000000000008a7b900052a00005d05d0be0101000400000000000003e8000000000000000100000003ddfffcb80000008a55b317c15d05d0ba0102903d1f3a3039aa84e1d183417d6e0a9b7cf2fc72ef97961173d88b0412e9a8ca556e1454d7b80467c2a510eea4e56939fb3f44ae383e00448dfd64cc276b6d2c6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d619000000000008bf4d00012800015d05d0ba0101009000000000000003e8000003e80000000a000000001d8119c00000008a87507dec5d05d0ba01020e6e6c4e76a8bbff0fd71f843151e12daf9be15faf6a209cf111d4d5df4929805dd072ada8e66264a58b0f920f14b87929389be581b18d635de397a97d44dfac6fe28c0ab6f1b372c1a6a246ae63f74f931e
2019-06-16T12:14:04.911Z lightning_gossipd(23791): Status closed, but waitpid 23791 says No child processes
rustyrussell commented 5 years ago

Yes, store would be v. useful, please!

m-schmoock commented 5 years ago

@rustyrussell I attached a whole bunch.

2019-06-17T07:28:49.712Z lightningd(25525): --------------------------------------------------
2019-06-17T07:28:49.723Z lightningd(25525): Server started with public key 02a2c53bc475cb92e4ab2f38a5bca56df695034ce90ad78c2f47c05911e3f79e41, alias schmoock.net (color #00ddff) and lightningd v0.7.1rc2-6-g5e3690b
2019-06-17T07:28:49.725Z plugin-rebalance.py Plugin rebalance.py initialized
2019-06-17T07:28:49.725Z plugin-summary.py Plugin summary.py initialized
2019-06-17T07:28:49.726Z plugin-sendinvoiceless.py Plugin sendinvoiceless.py initialized
2019-06-17T07:28:50.324Z lightning_gossipd(25544): gossip_store channel_announce 33891148 replaces 11756152!
2019-06-17T07:28:50.327Z lightning_gossipd(25544): gossip_store: Bad channel_announcement (10050000000000fd4bc0) truncating to 33891592
2019-06-17T07:28:50.327Z lightning_gossipd(25544): Deleting un-amounted channel_announcement @33891148
2019-06-17T07:28:50.327Z lightning_gossipd(25544): Deleted 18 unfinalized node_announcements
2019-06-17T07:28:50.328Z lightning_gossipd(25544): Deleted 1266 unupdated channel_announcements
2019-06-17T07:28:50.331Z lightning_gossipd(25544): We seem to be missing gossip messages
2019-06-17T07:28:53.190Z lightningd(25525): lightning_openingd-03ec201d396cc68d34360a13eee019189860006db990bfe85791f42fa9c00e8df1 chan #5498: STATUS_FAIL_GOSSIP_IO: Reading gossip: Success
2019-06-17T07:28:53.190Z lightningd(25525): 03ec201d396cc68d34360a13eee019189860006db990bfe85791f42fa9c00e8df1 chan #5498: Owning subdaemon lightning_openingd died (61952)
2019-06-17T07:28:53.191Z lightningd(25525): lightning_channeld-03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer connection lost
2019-06-17T07:28:53.191Z lightningd(25525): 03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T07:28:53.191Z lightningd(25525): lightning_channeld-02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c chan #213: Peer connection lost
2019-06-17T07:28:53.191Z lightningd(25525): 02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c chan #213: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T07:28:53.191Z lightningd(25525): lightning_channeld-030995c0c0217d763c2274aa6ed69a0bb85fa2f7d118f93631550f3b6219a577f5 chan #1276: Peer connection lost
2019-06-17T07:28:53.191Z lightningd(25525): 030995c0c0217d763c2274aa6ed69a0bb85fa2f7d118f93631550f3b6219a577f5 chan #1276: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T07:28:53.292Z lightningd(25525): lightning_channeld-03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer connection lost
2019-06-17T07:28:53.292Z lightningd(25525): 03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T07:28:53.292Z lightning_connectd(25543): Failed parsing msg gossipctl:  (version v0.7.1rc2-6-g5e3690b)
2019-06-17T07:28:53.293Z lightning_connectd(25543): backtrace: common/daemon.c:45 (send_backtrace) 0x55902c514586
2019-06-17T07:28:53.293Z lightning_connectd(25543): backtrace: common/status.c:192 (status_failed) 0x55902c519938
2019-06-17T07:28:53.293Z lightningd(25525): lightning_channeld-0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer connection lost
2019-06-17T07:28:53.294Z lightningd(25525): 0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T07:28:53.294Z lightning_connectd(25543): backtrace: connectd/connectd.c:317 (get_gossipfds) 0x55902c50a703
2019-06-17T07:28:53.485Z lightning_connectd(25543): backtrace: connectd/connectd.c:448 (peer_connected) 0x55902c50ac5b
2019-06-17T07:28:53.486Z lightningd(25525): lightning_channeld-0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer connection lost
2019-06-17T07:28:53.487Z lightningd(25525): 0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T07:28:53.488Z lightning_connectd(25543): backtrace: connectd/peer_exchange_initmsg.c:85 (peer_init_received) 0x55902c50e0fd
2019-06-17T07:28:53.492Z lightningd(25525): lightning_channeld-038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer connection lost
2019-06-17T07:28:53.492Z lightningd(25525): 038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T07:28:53.493Z lightning_connectd(25543): backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x55902c53b446
2019-06-17T07:28:53.493Z lightning_connectd(25543): backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x55902c53bfc3
2019-06-17T07:28:53.493Z lightning_connectd(25543): backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x55902c53c001
2019-06-17T07:28:53.493Z lightning_connectd(25543): backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x55902c53e056
2019-06-17T07:28:53.493Z lightning_connectd(25543): backtrace: connectd/connectd.c:1560 (main) 0x55902c50de4e
2019-06-17T07:28:53.493Z lightning_connectd(25543): backtrace: (null):0 ((null)) 0x7fad8ef96ce2
2019-06-17T07:28:53.493Z lightning_connectd(25543): backtrace: (null):0 ((null)) 0x55902c507e3d
2019-06-17T07:28:53.493Z lightning_connectd(25543): backtrace: (null):0 ((null)) 0xffffffffffffffff
2019-06-17T07:28:53.493Z lightning_connectd(25543): STATUS_FAIL_INTERNAL_ERROR: Failed parsing msg gossipctl: 
2019-06-17T07:28:53.493Z lightning_connectd(25543): Status closed, but waitpid 25543 says No child processes

I now deleted the store and started a fresh... Also, I deleted the store initially before the first crash, so it was not produced by older bugs.

gossip_store.1.zip gossip_store.3.zip

Edit: store 2 and 4 are too big. Split does not work as github doesnt support the "z01" ending, I renamed their ending to .zip, you need to undo this before unpacking.

gossip_store.2.z01.zip gossip_store.2.zip gossip_store.4.z01.zip gossip_store.4.zip

m-schmoock commented 5 years ago

Crashed again (still 5e3690b ) now after ~12 hours, started from with a fresh (deleted) store. Gossip store as left after the crash attached: gossip_store.zip

0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
gossip_store: Entry at 10818289->8848222 not updated? (version v0.7.1rc2-6-g5e3690b)
backtrace: common/daemon.c:45 (send_backtrace) 0x5594be4c4ac1
backtrace: common/status.c:192 (status_failed) 0x5594be4ca277
backtrace: gossipd/gossip_store.c:310 (gossip_store_compact) 0x5594be4b2bc8
backtrace: gossipd/gossip_store.c:364 (gossip_store_maybe_compact) 0x5594be4b2dca
backtrace: gossipd/gossip_store.c:463 (gossip_store_delete) 0x5594be4b31af
backtrace: gossipd/routing.c:1886 (routing_add_channel_update) 0x5594be4b8abc
backtrace: gossipd/routing.c:2077 (handle_channel_update) 0x5594be4b9190
backtrace: gossipd/gossipd.c:684 (handle_channel_update_msg) 0x5594be4bc557
backtrace: gossipd/gossipd.c:1627 (peer_msg_in) 0x5594be4be896
backtrace: common/daemon_conn.c:31 (handle_read) 0x5594be4c4f75
backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x5594be4e93c1
backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x5594be4e9f3e
backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x5594be4e9f7c
backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x5594be4ebfd1
backtrace: gossipd/gossipd.c:3063 (main) 0x5594be4c1e9f
backtrace: (null):0 ((null)) 0x7ff7e8e2ace2
backtrace: (null):0 ((null)) 0x5594be4ade3d
backtrace: (null):0 ((null)) 0xffffffffffffffff
STATUS_FAIL_INTERNAL_ERROR: gossip_store: Entry at 10818289->8848222 not updated?
Status closed, but waitpid 25774 says No child processes
m-schmoock commented 5 years ago

And crashed immediately after restart. I keept store from last crash). Store after crash: gossip_store.z01.zip (remove .zip ending here) gossip_store.zip

2019-06-17T21:44:44.974Z lightningd(552): --------------------------------------------------
2019-06-17T21:44:44.974Z lightningd(552): Server started with public key 02a2c53bc475cb92e4ab2f38a5bca56df695034ce90ad78c2f47c05911e3f79e41, alias schmoock.net (color #00ddff) and lightningd v0.7.1rc2-6-g5e3690b
2019-06-17T21:44:44.976Z plugin-rebalance.py Plugin rebalance.py initialized
2019-06-17T21:44:44.976Z plugin-summary.py Plugin summary.py initialized
2019-06-17T21:44:44.976Z plugin-sendinvoiceless.py Plugin sendinvoiceless.py initialized
2019-06-17T21:44:45.513Z lightning_gossipd(653): gossip_store channel_announce 34123783 replaces 10818289!
2019-06-17T21:44:45.513Z lightning_gossipd(653): gossip_store: Bad channel_announcement (10050000000000fd4bc0) truncating to 34124227
2019-06-17T21:44:45.513Z lightning_gossipd(653): Deleting un-amounted channel_announcement @34123783
2019-06-17T21:44:45.513Z lightning_gossipd(653): Deleted 6 unfinalized node_announcements
2019-06-17T21:44:45.514Z lightning_gossipd(653): Deleted 665 unupdated channel_announcements
2019-06-17T21:44:45.514Z lightning_gossipd(653): We seem to be missing gossip messages
2019-06-17T21:44:46.927Z lightningd(552): lightning_channeld-03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer connection lost
2019-06-17T21:44:46.927Z lightningd(552): 03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T21:44:47.332Z lightningd(552): lightning_channeld-0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer connection lost
2019-06-17T21:44:47.332Z lightningd(552): 0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T21:44:47.500Z lightningd(552): lightning_channeld-03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer connection lost
2019-06-17T21:44:47.500Z lightningd(552): 03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T21:44:47.502Z lightningd(552): lightning_channeld-0294e4f3b85296724fdcfa32bfdf6be4215da1ac044f5ada182e5d299a4e8d67a3 chan #5497: Peer connection lost
2019-06-17T21:44:47.502Z lightningd(552): 0294e4f3b85296724fdcfa32bfdf6be4215da1ac044f5ada182e5d299a4e8d67a3 chan #5497: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T21:44:47.502Z lightningd(552): lightning_channeld-02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c chan #213: Peer connection lost
2019-06-17T21:44:47.502Z lightningd(552): 02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c chan #213: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T21:44:47.502Z lightningd(552): lightning_channeld-0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer connection lost
2019-06-17T21:44:47.502Z lightningd(552): 0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T21:44:47.883Z lightning_connectd(652): Failed writing to gossipctl: Broken pipe (version v0.7.1rc2-6-g5e3690b)
2019-06-17T21:44:47.883Z lightningd(552): lightning_channeld-0214ec84c84827dd4911de56d2ecb77d367c6f24c658b8acfe4826b01968e45594 chan #5181: Peer connection lost
2019-06-17T21:44:47.883Z lightningd(552): 0214ec84c84827dd4911de56d2ecb77d367c6f24c658b8acfe4826b01968e45594 chan #5181: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T21:44:47.887Z lightning_connectd(652): backtrace: common/daemon.c:45 (send_backtrace) 0x5563c451f586
2019-06-17T21:44:47.890Z lightning_connectd(652): backtrace: common/status.c:192 (status_failed) 0x5563c4524938
2019-06-17T21:44:47.893Z lightning_connectd(652): backtrace: connectd/connectd.c:1296 (add_gossip_addrs) 0x5563c45181ad
2019-06-17T21:44:47.896Z lightning_connectd(652): backtrace: connectd/connectd.c:1345 (try_connect_peer) 0x5563c4518554
2019-06-17T21:44:47.897Z lightning_connectd(652): backtrace: connectd/connectd.c:1405 (connect_to_peer) 0x5563c45188ef
2019-06-17T21:44:47.897Z lightningd(552): lightning_channeld-038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer connection lost
2019-06-17T21:44:47.898Z lightningd(552): 038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-17T21:44:47.986Z lightning_connectd(652): backtrace: connectd/connectd.c:1475 (recv_req) 0x5563c4518bfd
2019-06-17T21:44:47.992Z lightning_connectd(652): backtrace: common/daemon_conn.c:31 (handle_read) 0x5563c451fa3a
2019-06-17T21:44:47.994Z lightning_connectd(652): backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x5563c4546446
2019-06-17T21:44:47.994Z lightning_connectd(652): backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x5563c4546fc3
2019-06-17T21:44:47.995Z lightning_connectd(652): backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x5563c4547001
2019-06-17T21:44:47.995Z lightning_connectd(652): backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x5563c4549056
2019-06-17T21:44:47.998Z lightning_connectd(652): backtrace: connectd/connectd.c:1560 (main) 0x5563c4518e4e
2019-06-17T21:44:48.001Z lightning_connectd(652): backtrace: (null):0 ((null)) 0x7f65416b1ce2
2019-06-17T21:44:48.007Z lightning_connectd(652): backtrace: (null):0 ((null)) 0x5563c4512e3d
2019-06-17T21:44:48.008Z lightning_connectd(652): backtrace: (null):0 ((null)) 0xffffffffffffffff
2019-06-17T21:44:48.008Z lightning_connectd(652): STATUS_FAIL_INTERNAL_ERROR: Failed writing to gossipctl: Broken pipe
2019-06-17T21:44:48.008Z lightning_connectd(652): Status closed, but waitpid 652 says No child processes

After restarting again (same store again) the daemon comes up with this:

2019-06-17T21:48:55.127Z lightningd(1697): --------------------------------------------------
2019-06-17T21:48:55.127Z lightningd(1697): Server started with public key 02a2c53bc475cb92e4ab2f38a5bca56df695034ce90ad78c2f47c05911e3f79e41, alias schmoock.net (color #00ddff) and lightningd v0.7.1rc2-6-g5e3690b
2019-06-17T21:48:55.129Z plugin-rebalance.py Plugin rebalance.py initialized
2019-06-17T21:48:55.130Z plugin-sendinvoiceless.py Plugin sendinvoiceless.py initialized
2019-06-17T21:48:55.130Z plugin-summary.py Plugin summary.py initialized
2019-06-17T21:48:55.450Z lightning_gossipd(1713): gossip_store channel_announce 27096589 replaces 8848222!
2019-06-17T21:48:55.450Z lightning_gossipd(1713): gossip_store: Bad channel_announcement (10050000000000fd4bc0) truncating to 27097033
2019-06-17T21:48:55.450Z lightning_gossipd(1713): Deleting un-amounted channel_announcement @27096589
2019-06-17T21:48:55.450Z lightning_gossipd(1713): Deleted 6 unfinalized node_announcements
2019-06-17T21:48:55.450Z lightning_gossipd(1713): Deleted 665 unupdated channel_announcements
2019-06-17T21:48:55.451Z lightning_gossipd(1713): We seem to be missing gossip messages
m-schmoock commented 5 years ago

This time I restarted with valgrind after tomorrow mornings crash I will add its output here...

rustyrussell commented 5 years ago

On those crashes where it dies immediately after starting, it's extremely weird. It doesn't even print a backtrace, and doesn't print "Setting peer XXX to gossip level YYY" either, which should happen.

Of course, it doesn't crash here, whatever I do, even using your gossip_stores.

Are you configured with --enable-developer or not? It's vaguely possible that makes a difference.

Another wild thought: have you tried make clean && ./configure (--enable-developer?) && make just in case there's something weird with your build?

m-schmoock commented 5 years ago

Can do, but will wait for the next crash in valgrind first...

m-schmoock commented 5 years ago

Are you configured with --enable-developer or not? It's vaguely possible that makes a difference.

Yes, I always enable developer also on my mainnet node.

m-schmoock commented 5 years ago

@rustyrussell On valgrind, I had a stacktrace print of a subdaemon (lightning_channeld) without a full crash (yet):

lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: gossip_store: can't read len 138 ~offset 29700096 (version v0.7.1rc2-6-g5e3690b)
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: backtrace: common/daemon.c:45 (send_backtrace) 0x5590a7540d16
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: backtrace: common/status.c:192 (status_failed) 0x5590a754c33b
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: backtrace: common/gossip_store.c:90 (gossip_store_next) 0x5590a7543b6d
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: backtrace: channeld/channeld.c:3022 (try_read_gossip_store) 0x5590a75340eb
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: backtrace: channeld/channeld.c:3140 (main) 0x5590a7534969
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: backtrace: (null):0 ((null)) 0x7fa85482cce2
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: backtrace: (null):0 ((null)) 0x5590a752bddd
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: backtrace: (null):0 ((null)) 0xffffffffffffffff
lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: STATUS_FAIL_INTERNAL_ERROR: gossip_store: can't read len 138 ~offset 29700096
m-schmoock commented 5 years ago

Note that 03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e is currently conncting and disconnecting in the logs like every 10 minutes..

rustyrussell commented 5 years ago

I wonder if this is a valgrind artifact; I've seen this same error under Travis (#2738), but don't recall if it was on a valgrind run.

m-schmoock commented 5 years ago

Nope, I sometimes see peers that cant keep a stable connection, but probably because of client inconsistencies.

SimonVrouwe commented 5 years ago

Just a wild hunch, @m-schmoock your gossip_store files are about twice as large as mine, so perhaps issue is related to your file system? Maybe try #2738 (rebased on master) and see if that logs any unusual.

m-schmoock commented 5 years ago

@SimonVrouwe On testnet they are smaller as well. My filesystem is (unsuprisingly) ext4. I have a clean software raid1 (mirrored).

m-schmoock commented 5 years ago

@rustyrussell it finally crashed within valgrind:

backtrace: common/daemon.c:45 (send_backtrace) 0x56229ba96ac1
backtrace: common/status.c:192 (status_failed) 0x56229ba9c277
backtrace: gossipd/gossip_store.c:310 (gossip_store_compact) 0x56229ba84bc8
backtrace: gossipd/gossip_store.c:364 (gossip_store_maybe_compact) 0x56229ba84dca
backtrace: gossipd/gossip_store.c:463 (gossip_store_delete) 0x56229ba851af
backtrace: gossipd/routing.c:1886 (routing_add_channel_update) 0x56229ba8aabc
backtrace: gossipd/routing.c:2077 (handle_channel_update) 0x56229ba8b190
backtrace: gossipd/gossipd.c:684 (handle_channel_update_msg) 0x56229ba8e557
backtrace: gossipd/gossipd.c:1627 (peer_msg_in) 0x56229ba90896
backtrace: common/daemon_conn.c:31 (handle_read) 0x56229ba96f75
backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x56229babb3c1
backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x56229babbf3e
backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x56229babbf7c
backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x56229babdfd1
backtrace: gossipd/gossipd.c:3063 (main) 0x56229ba93e9f
backtrace: (null):0 ((null)) 0x7f6b080dcce2
backtrace: (null):0 ((null)) 0x56229ba7fe3d
backtrace: (null):0 ((null)) 0xffffffffffffffff
STATUS_FAIL_INTERNAL_ERROR: gossip_store: Entry at 12933536->11154494 not updated?

Im a bit confused that valgrind only logged:

==5741== Memcheck, a memory error detector
==5741== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5741== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==5741== Command: ./lightningd/lightningd
==5741== Parent PID: 9092
==5741== 
==5741== 
==5741== HEAP SUMMARY:
==5741==     in use at exit: 73,901,452 bytes in 1,173,708 blocks
==5741==   total heap usage: 100,756,177 allocs, 99,582,469 frees, 21,006,875,362 bytes allocated
==5741== 
==5741== LEAK SUMMARY:
==5741==    definitely lost: 0 bytes in 0 blocks
==5741==    indirectly lost: 0 bytes in 0 blocks
==5741==      possibly lost: 2,018,416 bytes in 463 blocks
==5741==    still reachable: 71,883,036 bytes in 1,173,245 blocks
==5741==         suppressed: 0 bytes in 0 blocks
==5741== Rerun with --leak-check=full to see details of leaked memory
==5741== 
==5741== For counts of detected and suppressed errors, rerun with: -v
==5741== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Same crash as before after immideate restart (same store). Im now building a fresh clean build and try with clean store on valgrind...

rustyrussell commented 5 years ago

OK, good to know that valgrind isn't detecting any funny business, that eliminates one possible cause. I think in this case running under valgrind is overkill. However, running with a clean build may help.

m-schmoock commented 5 years ago

@rustyrussell It crashed again with a clean build and a fresh store under valgrind (no error there). The bactrace gets mixed with several messages like this within the same few milliseconds. This also happened in prior crashes, I usually filter backtraces from other artifacts, but in this case it may be a hint that there is other stuff going on exactly when the main daemon crashes, but it may also just be the logs that the main daemon kills its subprocesses before terminating.

2019-06-19T04:16:19.842Z lightningd(12347): 038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.843Z lightningd(12347): lightning_channeld-03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer connection lost

Extracted backtrace:

gossip_store: Entry at 24329819->19724296 not updated? (version v0.7.1rc2-6-g5e3690b)
backtrace: common/daemon.c:45 (send_backtrace) 0x55cc5e234ac1
backtrace: common/status.c:192 (status_failed) 0x55cc5e23a277
backtrace: gossipd/gossip_store.c:310 (gossip_store_compact) 0x55cc5e222bc8
backtrace: gossipd/gossip_store.c:364 (gossip_store_maybe_compact) 0x55cc5e222dca
backtrace: gossipd/gossip_store.c:463 (gossip_store_delete) 0x55cc5e2231af
backtrace: gossipd/routing.c:1886 (routing_add_channel_update) 0x55cc5e228abc
backtrace: gossipd/routing.c:2077 (handle_channel_update) 0x55cc5e229190
backtrace: gossipd/gossipd.c:684 (handle_channel_update_msg) 0x55cc5e22c557
backtrace: gossipd/gossipd.c:1627 (peer_msg_in) 0x55cc5e22e896
backtrace: common/daemon_conn.c:31 (handle_read) 0x55cc5e234f75
backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x55cc5e2593c1
backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x55cc5e259f3e
backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x55cc5e259f7c
backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x55cc5e25bfd1
backtrace: gossipd/gossipd.c:3063 (main) 0x55cc5e231e9f
backtrace: (null):0 ((null)) 0x7fbd5f5e6ce2
backtrace: (null):0 ((null)) 0x55cc5e21de3d
backtrace: (null):0 ((null)) 0xffffffffffffffff
STATUS_FAIL_INTERNAL_ERROR: gossip_store: Entry at 24329819->19724296 not updated?

Original data (look at the timestamps!):

2019-06-19T04:16:19.825Z lightning_gossipd(12386): gossip_store: Entry at 24329819->19724296 not updated? (version v0.7.1rc2-6-g5e3690b)
2019-06-19T04:16:19.827Z lightning_gossipd(12386): backtrace: common/daemon.c:45 (send_backtrace) 0x55cc5e234ac1
2019-06-19T04:16:19.828Z lightningd(12347): lightning_channeld-0294e4f3b85296724fdcfa32bfdf6be4215da1ac044f5ada182e5d299a4e8d67a3 chan #5497: Peer connection lost
2019-06-19T04:16:19.828Z lightningd(12347): 0294e4f3b85296724fdcfa32bfdf6be4215da1ac044f5ada182e5d299a4e8d67a3 chan #5497: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.829Z lightningd(12347): lightning_channeld-02d776ab8db0f62572e0435a70a5f3f3b2ee803357cbef9f915720a9d377a71c6b chan #2283: Peer connection lost
2019-06-19T04:16:19.829Z lightningd(12347): 02d776ab8db0f62572e0435a70a5f3f3b2ee803357cbef9f915720a9d377a71c6b chan #2283: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.830Z lightningd(12347): lightning_channeld-0290fb6ecd5865f7a973e1d48ffbc2612f80f6a2defbd6a188928bd0492bc2847e chan #2083: Peer connection lost
2019-06-19T04:16:19.830Z lightningd(12347): 0290fb6ecd5865f7a973e1d48ffbc2612f80f6a2defbd6a188928bd0492bc2847e chan #2083: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.831Z lightningd(12347): lightning_channeld-021607cfce19a4c5e7e6e738663dfafbbbac262e4ff76c2c9b30dbeefc35c00643 chan #2501: Peer connection lost
2019-06-19T04:16:19.832Z lightningd(12347): 021607cfce19a4c5e7e6e738663dfafbbbac262e4ff76c2c9b30dbeefc35c00643 chan #2501: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.832Z lightningd(12347): lightning_channeld-0214ec84c84827dd4911de56d2ecb77d367c6f24c658b8acfe4826b01968e45594 chan #5181: Peer connection lost
2019-06-19T04:16:19.833Z lightningd(12347): 0214ec84c84827dd4911de56d2ecb77d367c6f24c658b8acfe4826b01968e45594 chan #5181: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.833Z lightningd(12347): lightning_channeld-030995c0c0217d763c2274aa6ed69a0bb85fa2f7d118f93631550f3b6219a577f5 chan #1276: Peer connection lost
2019-06-19T04:16:19.834Z lightningd(12347): 030995c0c0217d763c2274aa6ed69a0bb85fa2f7d118f93631550f3b6219a577f5 chan #1276: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.835Z lightningd(12347): lightning_channeld-03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer connection lost
2019-06-19T04:16:19.835Z lightningd(12347): 03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.836Z lightningd(12347): lightning_channeld-0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer connection lost
2019-06-19T04:16:19.836Z lightningd(12347): 0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.837Z lightningd(12347): lightning_channeld-03d67f36c4f81789e2fe425028bacc96b199813eae426c517f589a45f1136c1fe5 chan #3396: Peer connection lost
2019-06-19T04:16:19.838Z lightningd(12347): 03d67f36c4f81789e2fe425028bacc96b199813eae426c517f589a45f1136c1fe5 chan #3396: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.838Z lightningd(12347): lightning_channeld-0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer connection lost
2019-06-19T04:16:19.839Z lightningd(12347): 0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.840Z lightningd(12347): lightning_channeld-032cc4541b25e86e39a7d450a979c1a9adbe2878df3a93fcb59c96c700bfe26aa3 chan #5485: Peer connection lost
2019-06-19T04:16:19.840Z lightningd(12347): 032cc4541b25e86e39a7d450a979c1a9adbe2878df3a93fcb59c96c700bfe26aa3 chan #5485: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.841Z lightning_gossipd(12386): backtrace: common/status.c:192 (status_failed) 0x55cc5e23a277
2019-06-19T04:16:19.842Z lightningd(12347): lightning_channeld-038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer connection lost
2019-06-19T04:16:19.842Z lightningd(12347): 038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.843Z lightningd(12347): lightning_channeld-03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer connection lost
2019-06-19T04:16:19.843Z lightningd(12347): 03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.844Z lightningd(12347): lightning_channeld-02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c chan #213: Peer connection lost
2019-06-19T04:16:19.834Z lightningd(12347): 030995c0c0217d763c2274aa6ed69a0bb85fa2f7d118f93631550f3b6219a577f5 chan #1276: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.835Z lightningd(12347): lightning_channeld-03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer connection lost
2019-06-19T04:16:19.835Z lightningd(12347): 03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.836Z lightningd(12347): lightning_channeld-0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer connection lost
2019-06-19T04:16:19.836Z lightningd(12347): 0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.836Z lightningd(12347): 0279c22ed7a068d10dc1a38ae66d2d6461e269226c60258c021b1ddcdfe4b00bc4 chan #58: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.837Z lightningd(12347): lightning_channeld-03d67f36c4f81789e2fe425028bacc96b199813eae426c517f589a45f1136c1fe5 chan #3396: Peer connection lost
2019-06-19T04:16:19.838Z lightningd(12347): 03d67f36c4f81789e2fe425028bacc96b199813eae426c517f589a45f1136c1fe5 chan #3396: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.838Z lightningd(12347): lightning_channeld-0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer connection lost
2019-06-19T04:16:19.839Z lightningd(12347): 0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.840Z lightningd(12347): lightning_channeld-032cc4541b25e86e39a7d450a979c1a9adbe2878df3a93fcb59c96c700bfe26aa3 chan #5485: Peer connection lost
2019-06-19T04:16:19.840Z lightningd(12347): 032cc4541b25e86e39a7d450a979c1a9adbe2878df3a93fcb59c96c700bfe26aa3 chan #5485: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.841Z lightning_gossipd(12386): backtrace: common/status.c:192 (status_failed) 0x55cc5e23a277
2019-06-19T04:16:19.842Z lightningd(12347): lightning_channeld-038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer connection lost
2019-06-19T04:16:19.842Z lightningd(12347): 038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.843Z lightningd(12347): lightning_channeld-03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer connection lost
2019-06-19T04:16:19.843Z lightningd(12347): 03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.844Z lightningd(12347): lightning_channeld-02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c chan #213: Peer connection lost
2019-06-19T04:16:19.845Z lightningd(12347): 02004c625d622245606a1ea2c1c69cfb4516b703b47945a3647713c05fe4aaeb1c chan #213: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.845Z lightning_gossipd(12386): backtrace: gossipd/gossip_store.c:310 (gossip_store_compact) 0x55cc5e222bc8
2019-06-19T04:16:19.846Z lightningd(12347): lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: Peer connection lost
2019-06-19T04:16:19.846Z lightningd(12347): 03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T04:16:19.848Z lightning_gossipd(12386): backtrace: gossipd/gossip_store.c:364 (gossip_store_maybe_compact) 0x55cc5e222dca
2019-06-19T04:16:19.849Z lightning_gossipd(12386): backtrace: gossipd/gossip_store.c:463 (gossip_store_delete) 0x55cc5e2231af
2019-06-19T04:16:19.850Z lightning_gossipd(12386): backtrace: gossipd/routing.c:1886 (routing_add_channel_update) 0x55cc5e228abc
2019-06-19T04:16:19.850Z lightning_gossipd(12386): backtrace: gossipd/routing.c:2077 (handle_channel_update) 0x55cc5e229190
2019-06-19T04:16:19.851Z lightning_gossipd(12386): backtrace: gossipd/gossipd.c:684 (handle_channel_update_msg) 0x55cc5e22c557
2019-06-19T04:16:19.852Z lightning_gossipd(12386): backtrace: gossipd/gossipd.c:1627 (peer_msg_in) 0x55cc5e22e896
2019-06-19T04:16:19.853Z lightning_gossipd(12386): backtrace: common/daemon_conn.c:31 (handle_read) 0x55cc5e234f75
2019-06-19T04:16:19.854Z lightning_gossipd(12386): backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x55cc5e2593c1
2019-06-19T04:16:19.855Z lightning_gossipd(12386): backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x55cc5e259f3e
2019-06-19T04:16:19.856Z lightning_gossipd(12386): backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x55cc5e259f7c
2019-06-19T04:16:19.857Z lightning_gossipd(12386): backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x55cc5e25bfd1
2019-06-19T04:16:19.858Z lightning_gossipd(12386): backtrace: gossipd/gossipd.c:3063 (main) 0x55cc5e231e9f
2019-06-19T04:16:19.859Z lightning_gossipd(12386): backtrace: (null):0 ((null)) 0x7fbd5f5e6ce2
2019-06-19T04:16:19.860Z lightning_gossipd(12386): backtrace: (null):0 ((null)) 0x55cc5e21de3d
2019-06-19T04:16:19.860Z lightning_gossipd(12386): backtrace: (null):0 ((null)) 0xffffffffffffffff
2019-06-19T04:16:19.862Z lightning_gossipd(12386): STATUS_FAIL_INTERNAL_ERROR: gossip_store: Entry at 24329819->19724296 not updated?
m-schmoock commented 5 years ago

After restart I got this immediately:

2019-06-19T08:06:13.827Z lightningd(28391): --------------------------------------------------
2019-06-19T08:06:13.827Z lightningd(28391): Server started with public key 02a2c53bc475cb92e4ab2f38a5bca56df695034ce90ad78c2f47c05911e3f79e41, alias schmoock.net (color #00ddff) and lightningd v0.7.1rc2-6-g5e3690b
2019-06-19T08:06:13.828Z plugin-rebalance.py Plugin rebalance.py initialized
2019-06-19T08:06:13.829Z plugin-sendinvoiceless.py Plugin sendinvoiceless.py initialized
2019-06-19T08:06:13.829Z plugin-summary.py Plugin summary.py initialized
2019-06-19T08:06:14.441Z lightning_gossipd(28408): gossip_store channel_announce 32185870 replaces 11153878!
2019-06-19T08:06:14.441Z lightning_gossipd(28408): gossip_store: Bad channel_announcement (10050000000000fd4bc0) truncating to 32186314
2019-06-19T08:06:14.441Z lightning_gossipd(28408): Deleting un-amounted channel_announcement @32185870
2019-06-19T08:06:14.441Z lightning_gossipd(28408): Deleted 34 unfinalized node_announcements
2019-06-19T08:06:14.441Z lightning_gossipd(28408): Deleted 2906 unupdated channel_announcements
2019-06-19T08:06:14.444Z lightning_gossipd(28408): We seem to be missing gossip messages
2019-06-19T08:06:17.457Z lightningd(28391): lightning_channeld-03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer connection lost
2019-06-19T08:06:17.459Z lightningd(28391): 03bc9337c7a28bb784d67742ebedd30a93bacdf7e4ca16436ef3798000242b2251 chan #19: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T08:06:17.459Z lightningd(28391): lightning_channeld-0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer connection lost
2019-06-19T08:06:17.459Z lightningd(28391): 0218078e511de456bba70dda5e76279a949065391f41060deccb1085ac03ee4c9b chan #67: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T08:06:17.459Z lightningd(28391): lightning_channeld-038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer connection lost
2019-06-19T08:06:17.460Z lightningd(28391): 038f41f4a910c82a960ac9c8e5cc9daf729550bfb0b6a13d86edc2f79bff2e89ef chan #4457: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T08:06:17.460Z lightningd(28391): lightning_channeld-03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: Peer connection lost
2019-06-19T08:06:17.460Z lightningd(28391): 03d0fa0845bc81cbf10db1204ed20979701a2f54d939e432d8d961833c2e8f1d1e chan #3134: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T08:06:17.460Z lightningd(28391): lightning_channeld-03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer connection lost
2019-06-19T08:06:17.460Z lightningd(28391): 03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601 chan #126: Peer transient failure in CHANNELD_NORMAL: lightning_channeld: Owning subdaemon lightning_channeld died (62208)
2019-06-19T08:06:17.464Z lightning_gossipd(28408): gossip_store: can't read len 217938570 offset 18845013/24347970 (version v0.7.1rc2-6-g5e3690b)
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: common/daemon.c:45 (send_backtrace) 0x557508047ac1
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: common/status.c:192 (status_failed) 0x55750804d277
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: gossipd/gossip_store.c:490 (gossip_store_get) 0x5575080362e8
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: gossipd/gossip_store.c:411 (delete_by_index) 0x557508035fa5
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: gossipd/gossip_store.c:453 (gossip_store_delete) 0x557508036177
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: gossipd/routing.c:1886 (routing_add_channel_update) 0x55750803babc
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: gossipd/routing.c:2077 (handle_channel_update) 0x55750803c190
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: gossipd/gossipd.c:684 (handle_channel_update_msg) 0x55750803f557
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: gossipd/gossipd.c:1627 (peer_msg_in) 0x557508041896
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: common/daemon_conn.c:31 (handle_read) 0x557508047f75
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x55750806c3c1
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x55750806cf3e
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x55750806cf7c
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x55750806efd1
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: gossipd/gossipd.c:3063 (main) 0x557508044e9f
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: (null):0 ((null)) 0x7f9bd847ece2
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: (null):0 ((null)) 0x557508030e3d
2019-06-19T08:06:17.464Z lightning_gossipd(28408): backtrace: (null):0 ((null)) 0xffffffffffffffff
2019-06-19T08:06:17.464Z lightning_gossipd(28408): STATUS_FAIL_INTERNAL_ERROR: gossip_store: can't read len 217938570 offset 18845013/24347970

Another restart (same store) it comes up without crash with some notes about missing/truncating gossip:

2019-06-19T08:08:37.813Z lightningd(29059): --------------------------------------------------
2019-06-19T08:08:37.814Z lightningd(29059): Server started with public key 02a2c53bc475cb92e4ab2f38a5bca56df695034ce90ad78c2f47c05911e3f79e41, alias schmoock.net (color #00ddff) and lightningd v0.7.1rc2-6-g5e3690b
2019-06-19T08:08:37.815Z plugin-rebalance.py Plugin rebalance.py initialized
2019-06-19T08:08:37.816Z plugin-sendinvoiceless.py Plugin sendinvoiceless.py initialized
2019-06-19T08:08:37.816Z plugin-summary.py Plugin summary.py initialized
2019-06-19T08:08:39.170Z lightning_gossipd(29075): gossip_store channel_announce 25707430 replaces 9962844!
2019-06-19T08:08:39.170Z lightning_gossipd(29075): gossip_store: Bad channel_announcement (10050000000000fd4bc0) truncating to 25707874
2019-06-19T08:08:39.170Z lightning_gossipd(29075): Deleting un-amounted channel_announcement @25707430
2019-06-19T08:08:39.170Z lightning_gossipd(29075): Deleted 34 unfinalized node_announcements
2019-06-19T08:08:39.171Z lightning_gossipd(29075): Deleted 2906 unupdated channel_announcements
2019-06-19T08:08:39.175Z lightning_gossipd(29075): We seem to be missing gossip messages
SimonVrouwe commented 5 years ago

@m-schmoock I wrote a little gossip_store parser read_gs.zip in python to analyze your gossip_store files.

Results so far: gossip_store.1 and gossip_store.3 are ok.

gossip_store.2 Can be parsed up to offset=27933649, then the 12-byte header (hex) looks like 07000001b01714a82f5d034e. That 07 looks like a version byte and parsing again from offset=27933650 (to skip the 07) successfully parses the rest of the file. Not sure what this means, but it looks like a new new gossip_store was appended?

Same with gossip_store.4, but at offset=27970513.

your gossip_store files are about twice as large as mine

Explained by above, nr 2 and 4 are twice as big (~53M vs ~25M) because it appears to have a complete new file appended to it. Nr 1 and 3 are ~35M, could be explained by compacting disabled (after failure) or difference in on-disk size (file system block size stuff etc.)

m-schmoock commented 5 years ago

I think you have a good point here, as I also noticed that sometimes my stores are twice as big...

rustyrussell commented 5 years ago

@SimonVrouwe Doh! Thanks: https://github.com/ElementsProject/lightning/pull/2758

m-schmoock commented 5 years ago

Thanks, will now try 47b5f2e and tell if latest changes help :D

m-schmoock commented 5 years ago

@rustyrussell Sorry, but a clean 47b5f2e crashed again with gossip (after crash: gossip_store.zip ) after 12hours. There was no other suspicious activity in the log at this moment. Do you need any other info maybe, like connected peers etc... ? My system is otherwise totally stable and up to date.

lightning_gossipd(31034): gossip_store: Entry at 16962683->13621905 not updated? (version v0.7.1rc2-11-g47b5f2e)
lightning_gossipd(31034): backtrace: common/daemon.c:45 (send_backtrace) 0x5636abb90a60
lightning_gossipd(31034): backtrace: common/status.c:192 (status_failed) 0x5636abb961e8
lightning_gossipd(31034): backtrace: gossipd/gossip_store.c:310 (gossip_store_compact) 0x5636abb7eb96
lightning_gossipd(31034): backtrace: gossipd/gossip_store.c:364 (gossip_store_maybe_compact) 0x5636abb7ed98
lightning_gossipd(31034): backtrace: gossipd/gossip_store.c:463 (gossip_store_delete) 0x5636abb7f17b
lightning_gossipd(31034): backtrace: gossipd/routing.c:1886 (routing_add_channel_update) 0x5636abb84adf
lightning_gossipd(31034): backtrace: gossipd/routing.c:2077 (handle_channel_update) 0x5636abb851b3
lightning_gossipd(31034): backtrace: gossipd/gossipd.c:684 (handle_channel_update_msg) 0x5636abb88586
lightning_gossipd(31034): backtrace: gossipd/gossipd.c:1627 (peer_msg_in) 0x5636abb8a876
lightning_gossipd(31034): backtrace: common/daemon_conn.c:31 (handle_read) 0x5636abb90f1b
lightning_gossipd(31034): backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x5636abbb50a0
lightning_gossipd(31034): backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x5636abbb5c1d
lightning_gossipd(31034): backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x5636abbb5c5b
lightning_gossipd(31034): backtrace: ccan/ccan/io/poll.c:445 (io_loop) 0x5636abbb7cb0
lightning_gossipd(31034): backtrace: gossipd/gossipd.c:3063 (main) 0x5636abb8de57
lightning_gossipd(31034): backtrace: (null):0 ((null)) 0x7f7279126ee2
lightning_gossipd(31034): backtrace: (null):0 ((null)) 0x5636abb79e9d
lightning_gossipd(31034): backtrace: (null):0 ((null)) 0xffffffffffffffff
lightning_gossipd(31034): STATUS_FAIL_INTERNAL_ERROR: gossip_store: Entry at 16962683->13621905 not updated?
lightning_gossipd(31034): Status closed, but waitpid 31034 says No child processes

After restart (without deleting gossip) it came up like this, so no second crash and a big store :D

2019-06-20T21:38:43.414Z lightning_gossipd(17711): gossip_store channel_announce 30895778 replaces 11420062!
2019-06-20T21:38:43.414Z lightning_gossipd(17711): gossip_store: Bad channel_announcement (10050000000000989680) truncating to 30896222
2019-06-20T21:38:43.414Z lightning_gossipd(17711): Deleting un-amounted channel_announcement @30895778
2019-06-20T21:38:43.414Z lightning_gossipd(17711): Deleted 63 unfinalized node_announcements
2019-06-20T21:38:43.414Z lightning_gossipd(17711): Deleted 4401 unupdated channel_announcements
2019-06-20T21:38:43.414Z lightning_gossipd(17711): We seem to be missing gossip messages
m-schmoock commented 5 years ago

With recent workarounds, this seems fine now, no more crashes for two days.