ChainSafe / forest

🌲 Rust Filecoin Node Implementation
https://forest.chainsafe.io
Apache License 2.0
632 stars 153 forks source link

Forest stuck on downloading headers #4441

Closed LesnyRumcajs closed 3 months ago

LesnyRumcajs commented 3 months ago

Describe the bug

Forest sometimes gets stuck on downloading block headers.

To reproduce

It happens from time to time, likely due to network conditions and peer selection (or not?). After a restart it usually recovers. No clear way to reproduce without further investigation.

Log output

2024-06-20T14:49:09.455644Z  INFO forest_filecoin::daemon: Starting Forest daemon, version 0.19.0+git.a4fe792
2024-06-20T14:49:09.455784Z  WARN forest_filecoin::daemon: Forest has encryption disabled
2024-06-20T14:49:09.455877Z  INFO forest_filecoin::daemon: Admin token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJBbGxvdyI6WyJyZWFkIiwid3JpdGUiLCJzaWduIiwiYWRtaW4iXSwiZXhwIjoxNzI0MDc4OTQ5fQ.2mMWKCjauicZBmxYUExdRkzBjm43IQQlHI_L_aFQhcA
2024-06-20T14:49:09.456279Z  INFO forest_filecoin::db::migration::db_migration: No database migration required
2024-06-20T14:49:09.622853Z  INFO forest_filecoin::daemon::bundle: Loading actor bundle from /var/tmp/forest_actor_bundle.car.zst set by FOREST_ACTOR_BUNDLE_PATH environment variable
2024-06-20T14:49:10.118055Z  INFO forest_filecoin::genesis: Initialized genesis: bafy2bzacecyaggy24wol5ruvs6qm73gjibs2l2iyhcqmvi7r7a4ph7zx3yqd4
2024-06-20T14:49:10.118127Z  INFO forest_filecoin::daemon: Prometheus server started at 0.0.0.0:6116
2024-06-20T14:49:10.118648Z  INFO forest_filecoin::daemon: Using network :: calibnet
2024-06-20T14:49:10.118987Z  INFO forest_filecoin::libp2p::behaviour: libp2p Forest version: 0.19.0+git.a4fe792
2024-06-20T14:49:10.119651Z  INFO libp2p_swarm: local_peer_id=12D3KooWAAc8fL1KRjwTEdWV2KJ9KarV51f4f3u7nQ1smJvy8XDT
2024-06-20T14:49:10.119691Z  INFO forest_filecoin::libp2p::service: p2p network peer id: 12D3KooWAAc8fL1KRjwTEdWV2KJ9KarV51f4f3u7nQ1smJvy8XDT
2024-06-20T14:49:10.120405Z  INFO forest_filecoin::libp2p::service: p2p peer is now listening on: /ip4/127.0.0.1/tcp/42433
2024-06-20T14:49:10.120495Z  INFO forest_filecoin::daemon: JSON-RPC endpoint will listen at 127.0.0.1:2345
2024-06-20T14:49:10.122254Z  INFO forest_filecoin::chain_sync::chain_muxer: Evaluating network head...
2024-06-20T14:49:10.122315Z  INFO forest_filecoin::chain_sync::chain_muxer: local head is behind the network, local_epoch: 1718845, now_epoch: 1718952
2024-06-20T14:49:10.132399Z  INFO forest_filecoin::rpc: Ready for RPC connections
2024-06-20T14:49:10.494453Z  INFO forest_filecoin::libp2p::service: Running libp2p service
2024-06-20T14:49:10.540783Z  INFO forest_filecoin::chain_sync::chain_muxer: Local node is behind the network, starting BOOTSTRAP from LOCAL_HEAD = 1718845 -> NETWORK_HEAD = 1718952
2024-06-20T14:49:15.712508Z  INFO forest::progress: Downloading headers 105, 20 items/s, elapsed time: 5s
2024-06-20T14:49:20.121647Z  INFO forest_filecoin::libp2p::discovery: UPnP GatewayNotFound
2024-06-20T14:49:20.803622Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 10s
2024-06-20T14:49:26.029280Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 15s
2024-06-20T14:49:31.501326Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 20s
2024-06-20T14:49:36.746777Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 26s
2024-06-20T14:49:41.757951Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 31s
2024-06-20T14:49:47.089244Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 36s
2024-06-20T14:49:52.288682Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 41s
...
...
...
2024-06-20T16:50:19.531622Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 2h 1m 8s
2024-06-20T16:50:25.063933Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 2h 1m 14s
2024-06-20T16:50:30.213260Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 2h 1m 19s
2024-06-20T16:50:35.462762Z  INFO forest::progress: Downloading headers 105, 0 items/s, elapsed time: 2h 1m 24s

Expected behaviour

Forest doesn't get stuck. It either fails or (more preferred) recovers on its own without manual intervention or external scripting.

Screenshots

Environment (please complete the following information):

Other information and links

It was not happening ~1 month ago (at least I didn't notice it), so it may be a regression. That, or the network has new, exciting conditions Forest needs to adapt to.