ZcashFoundation / zebra

Zcash - Financial Privacy in Rust 🦓
https://zfnd.org/zebra/
Apache License 2.0
413 stars 106 forks source link

Fix repeated block timeouts during initial sync #5709

Open teor2345 opened 1 year ago

teor2345 commented 1 year ago

Motivation

During the initial sync, Zebra's block validation repeatedly times out, and the syncer resets:

2022-11-23T09:22:15.197732Z WARN {net="Main"}:sync:try_to_sync: zebrad::components::sync: error downloading and verifying block e=ValidationRequestError { error: Elapsed(()), height: Height(1856228), hash: block::Hash("0000000000d433e30155445ddc72854eb7395827c1c241b249a255585ea2716a") }

https://github.com/ZcashFoundation/zebra/actions/runs/3519147385/jobs/5923785413#step:8:12498

This only happens in some parts of the chain, and it seems to happen at slightly different block heights each time. When it does happen, it can happen repeatedly every 10-15 minutes: https://github.com/ZcashFoundation/zebra/actions/runs/3519147385/jobs/5923785413#step:8:12465

This is not a priority because Zebra is still much faster than zcashd.

Designs

This could be happening because:

Related Work

If we speed up block commits, it might solve this issue:

teor2345 commented 1 year ago

Due to this block verification timeout bug, full syncs are currently taking 40-48 hours. That's still acceptable performance, I think?

mpguerra commented 1 year ago

Is this still happening? if so, when was the last time we saw it?

teor2345 commented 1 year ago

Is this still happening? if so, when was the last time we saw it?

Yes, it happens multiple times in every full sync, this sync is running right now: https://github.com/ZcashFoundation/zebra/actions/runs/4931890966/jobs/8824296617#step:8:8745

In the last completed full sync, it happened at least 200 times, which causes a delay of 3-5 hours in 50 hour sync: https://github.com/ZcashFoundation/zebra/actions/runs/4888643906/jobs/8769200765#step:8:17512

mpguerra commented 8 months ago

@ZcashFoundation/zebra-team has anyone noticed this happening recently? I'm tempted to close

upbqdn commented 8 months ago

Yes, we still have this bug. I got a few of the errors while running an unsynced instance locally:

2024-03-13T13:23:23.783618Z  WARN {zebrad="58bfe97" net="Main"}:sync:try_to_sync:try_to_sync_once{extra_hashes={}}: zebrad::components::sync: error downloading and verifying block e=ValidationRequestError { error: Elapsed(()), height: Height(1941142), hash: block::Hash("0000000000aca7473124dd16884dfe35ee602ba7f9dda66bc2633563c5db3a4e") }
2024-03-13T13:34:23.516921Z  WARN {zebrad="58bfe97" net="Main"}:sync:try_to_sync:try_to_sync_once{extra_hashes={}}: zebrad::components::sync: error downloading and verifying block e=ValidationRequestError { error: Elapsed(()), height: Height(1942686), hash: block::Hash("0000000000ffb91722514ad09b493355add5b8f5559600b7dd7c7b715cdc80a2") }
2024-03-13T13:44:11.547760Z  WARN {zebrad="58bfe97" net="Main"}:sync:try_to_sync:try_to_sync_once{extra_hashes={}}: zebrad::components::sync: error downloading and verifying block e=ValidationRequestError { error: Elapsed(()), height: Height(1944827), hash: block::Hash("0000000001b3c62ba33f778464b308a0435f25a2afa995d1bbbfa553e29c9878") }
2024-03-13T15:31:57.838278Z  WARN {zebrad="58bfe97" net="Main"}:sync:try_to_sync:try_to_sync_once{extra_hashes={}}: zebrad::components::sync: error downloading and verifying block e=ValidationRequestError { error: Elapsed(()), height: Height(1977421), hash: block::Hash("00000000016daccf7867a7b7ad930e118d5f5cd2b13612eea2df038c7aa9747b") }
2024-03-13T15:41:43.896349Z  WARN {zebrad="58bfe97" net="Main"}:sync:try_to_sync:try_to_sync_once{extra_hashes={}}: zebrad::components::sync: error downloading and verifying block e=ValidationRequestError { error: Elapsed(()), height: Height(1978958), hash: block::Hash("00000000006886ebe06217ded4b4718af82549afab6a1a66354c34cec93882be") }
2024-03-13T15:51:26.100828Z  WARN {zebrad="58bfe97" net="Main"}:sync:try_to_sync:try_to_sync_once{extra_hashes={}}: zebrad::components::sync: error downloading and verifying block e=ValidationRequestError { error: Elapsed(()), height: Height(1980798), hash: block::Hash("00000000016422ba324158bd0ed9597dbcdec3a35ee94f08968a92b2440e1b9a") }
2024-03-13T16:01:18.321517Z  WARN {zebrad="58bfe97" net="Main"}:sync:try_to_sync:try_to_sync_once{extra_hashes={}}: zebrad::components::sync: error downloading and verifying block e=ValidationRequestError { error: Elapsed(()), height: Height(1982826), hash: block::Hash("0000000000db2bf60283410f90846223e981d3a4c0d5d9f656804be36067d163") }

The timing is interesting and doesn't seem random. It looks like every few hours, we get a series of error logs that are ten minutes apart.