Bodies stage can be stuck repeatly

wtdcode commented 1 month ago

Describe the bug

The Bodies stage is stuck like:

reth-1  | 2024-07-15T06:07:05.066510Z  INFO Preparing stage pipeline_stages=2/12 stage=Bodies checkpoint=24497000 target=40449104
reth-1  | 2024-07-15T06:07:06.847326Z  INFO Executing stage pipeline_stages=2/12 stage=Bodies checkpoint=24497000 target=40449104
reth-1  | 2024-07-15T06:07:07.140215Z  INFO Committed stage progress pipeline_stages=2/12 stage=Bodies checkpoint=24498000 target=40449104 stage_progress=60.56%
reth-1  | 2024-07-15T06:07:07.201340Z  INFO Preparing stage pipeline_stages=2/12 stage=Bodies checkpoint=24498000 target=40449104
reth-1  | 2024-07-15T06:07:12.097781Z  INFO Status connected_peers=5 freelist=11 stage=Bodies checkpoint=24498000 target=40449104 stage_progress=60.56%
reth-1  | 2024-07-15T06:07:37.097583Z  INFO Status connected_peers=5 freelist=11 stage=Bodies checkpoint=24498000 target=40449104 stage_progress=60.56%
reth-1  | 2024-07-15T06:08:02.097760Z  INFO Status connected_peers=5 freelist=11 stage=Bodies checkpoint=24498000 target=40449104 stage_progress=60.56%
reth-1  | 2024-07-15T06:08:27.097639Z  INFO Status connected_peers=5 freelist=11 stage=Bodies checkpoint=24498000 target=40449104 stage_progress=60.56%
reth-1  | 2024-07-15T06:08:52.097039Z  INFO Status connected_peers=5 freelist=11 stage=Bodies checkpoint=24498000 target=40449104 stage_progress=60.56%

It doesn't move even finally getting more than 90 peers. Restarting will help but it will be stuck again after receiving a few blocks.

Steps to reproduce

Start syncing from scratch.

Node logs

No response

Platform(s)

Linux (x86)

What version/commit are you on?

develop branch

What database version are you on?

-

Which chain / network are you on?

-

What type of node are you running?

Archive (default)

What prune config do you use, if any?

No response

If you've built Reth from source, provide the full command you used

No response

Code of Conduct

[X] I agree to follow the Code of Conduct

zhk101 commented 1 month ago

Happens to me as well. Have to restart the node a couple of times during Bodies stage to get it going Full Bodies stage run here with 4 restarts

wtdcode commented 1 month ago

From what I could tell, it could be due to BodiesDownloader stucking but I can no longer reproduce this after passing the Bodies stage.

wtdcode commented 1 month ago

Execution stage also gets stuck. How about yours? @zhk101

zhk101 commented 1 month ago

Nah it never got stuck for me. Execution is offline essentially as I understand, works with the data downloaded from the Headers and Bodies stage. At least in my case it never got stuck, but at the same time I have a cron that runs once a day to stop the node, backup the datadir and restart it so maybe it never stayed up and running enough for it to happen.

That being said, Execution stage for me always failed after block 30m or so with some bad block error. I think this might have been fixed with the latest commit. My node is almost at that 30m mark now so I'll see if it passes it this time or not

wtdcode commented 1 month ago

Nah it never got stuck for me. Execution is offline essentially as I understand, works with the data downloaded from the Headers and Bodies stage. At least in my case it never got stuck, but at the same time I have a cron that runs once a day to stop the node, backup the datadir and restart it so maybe it never stayed up and running enough for it to happen

My execution stuck in checkpoint=0 =(

What's your commit hash of bsc reth?

zhk101 commented 1 month ago

What's your commit hash of bsc reth?

I keep updating reth along with the new commits. Especiially the ones around sidecars where my node failed at execution stage last time. Just restarting the node now with the lastest 'main' @ 4d9ea56a233aea6d3da2f0791af7ab591963bb55

zhk101 commented 1 month ago

Keep in mind that execution stage is very long, it takes around 4-6 days on my setup so it might be that yours isn't stuck but just very slow

wtdcode commented 1 month ago

Keep in mind that execution stage is very long, it takes around 4-6 days on my setup so it might be that yours isn't stuck but just very slow

But the checkpoint should move on according to my experience running reth on mainnet. Mine is always:

reth-1  | 2024-07-18T09:27:14.445961Z  INFO Status connected_peers=0 freelist=7 stage=Execution checkpoint=0 target=40449104
reth-1  | 2024-07-18T09:27:39.445216Z  INFO Status connected_peers=1 freelist=7 stage=Execution checkpoint=0 target=40449104
reth-1  | 2024-07-18T09:27:58.022755Z  INFO Received forkchoice updated message when syncing head_block_hash=0xfe7f292e66cd20a7b7e520c5595771b47aca38b874e53c219270d24c5d8e8249 safe_block_hash=0x0000000000000000000000000000000000000000000000000000000000000000 finalized_block_hash=0x0000000000000000000000000000000000000000000000000000000000000000
reth-1  | 2024-07-18T09:28:04.445560Z  INFO Status connected_peers=2 freelist=7 stage=Execution checkpoint=0 target=40449104
reth-1  | 2024-07-18T09:28:29.445027Z  INFO Status connected_peers=2 freelist=7 stage=Execution checkpoint=0 target=40449104
reth-1  | 2024-07-18T09:28:54.445558Z  INFO Status connected_peers=5 freelist=7 stage=Execution checkpoint=0 target=40449104

zhk101 commented 1 month ago

That's odd.. Here's my log. It went from checkpoint 0 to something else rather quickly indeed. I'm running my node in full mode btw.

2024-07-13T02:32:09.691068Z  INFO reth_node_events::node: Preparing stage pipeline_stages=4/12 stage=Execution checkpoint=0 target=40407494
2024-07-13T02:32:09.691992Z  INFO reth_node_events::node: Executing stage pipeline_stages=4/12 stage=Execution checkpoint=0 target=40407494
2024-07-13T02:32:26.531973Z  INFO reth::cli: Status connected_peers=199 freelist=8 stage=Execution checkpoint=0 target=40407494
2024-07-13T02:32:51.531467Z  INFO reth::cli: Status connected_peers=199 freelist=8 stage=Execution checkpoint=0 target=40407494
2024-07-13T02:33:16.531415Z  INFO reth::cli: Status connected_peers=199 freelist=8 stage=Execution checkpoint=0 target=40407494
2024-07-13T02:33:41.531929Z  INFO reth::cli: Status connected_peers=199 freelist=8 stage=Execution checkpoint=0 target=40407494
2024-07-13T02:33:59.242236Z  INFO reth_node_events::node: Committed stage progress pipeline_stages=4/12 stage=Execution checkpoint=500001 target=40407494 stage_progress=0.01%
2024-07-13T02:33:59.257124Z  INFO reth_node_events::node: Preparing stage pipeline_stages=4/12 stage=Execution checkpoint=500001 target=40407494
2024-07-13T02:33:59.257130Z  INFO reth_node_events::node: Executing stage pipeline_stages=4/12 stage=Execution checkpoint=500001 target=40407494
2024-07-13T02:34:06.531404Z  INFO reth::cli: Status connected_peers=199 freelist=5 stage=Execution checkpoint=500001 target=40407494 stage_progress=0.01%

wtdcode commented 1 month ago

hmmmm, I'm running in archive mode. Is that the root cause?

github-actions[bot] commented 4 weeks ago

This issue is stale because it has been open for 21 days with no activity.

github-actions[bot] commented 3 weeks ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

bnb-chain / reth