erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.09k stars 1.08k forks source link

DumpBlocks: DumpBodies: header missed in db: block_num #9479

Open tmeinlschmidt opened 6 months ago

tmeinlschmidt commented 6 months ago

System information

Erigon version: erigon version 2.58.0-9731f716

OS & Version: Linux

Commit hash: 9731f716

Erigon Command (with flags/config):

      --datadir /data/erigon 
      --authrpc.jwtsecret /data/prysm/jwt_secret 
      --private.api.addr 127.0.0.1:<grpcport> 
      --port <wireport> 
      --authrpc.port=<authrpcport> 
      --torrent.port <torrentport> 
      --prune= 
      --chain  mainnet 
      --torrent.download.rate 200mb 
      --bodies.cache 4GB 
      --maxpeers 300 
      --db.size.limit=7T

Consensus Layer: prysm

Consensus Layer Command (with flags/config):

      --datadir=/data/prysm
      --mainnet
      --rpc-host=0.0.0.0
      --p2p-udp-port 30200
      --p2p-tcp-port 30200
      --grpc-gateway-port 7028
      --execution-endpoint=http://localhost:7004
      --jwt-secret=/data/prysm/jwt_secret
      --accept-terms-of-use
      --genesis-state=/data/prysm/genesis/genesis.ssz
      --terminal-total-difficulty-override 58750000000000000000000"

Chain/Network: mainnet

Expected behaviour

no errors in log

Actual behaviour

DumpBlocks: DumpBodies: header missed in db: block_num=19000000

[INFO] [02-21|14:00:12.912] [NewPayload] Handling new payload        height=19276488 hash=0x6bbbf5fb639db651234beaa2bad8336e619d01590a31149bd63b7520a9521f58
[INFO] [02-21|14:00:13.731] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload)
[INFO] [02-21|14:00:14.042] RPC Daemon notified of new headers       from=19276487 to=19276488 hash=0x6bbbf5fb639db651234beaa2bad8336e619d01590a31149bd63b7520a9521f58 header sending=22.054µs log sending=337ns
[INFO] [02-21|14:00:14.042] head updated                             hash=0x6bbbf5fb639db651234beaa2bad8336e619d01590a31149bd63b7520a9521f58 number=19276488
[INFO] [02-21|14:00:14.043] [snapshots] Prune Blocks                 to=19000000 limit=100
[WARN] [02-21|14:00:14.047] [snapshots] retire blocks                err="DumpBlocks: DumpBodies: header missed in db: block_num=19000000,  hash=cf384012b91b081230cdf17a3f7dd370d8e67056058af6b272b3d54aa2714fac"
JamesCropcho commented 6 months ago

Identical problem on mine.

Logs sample

Feb 21 19:03:26 erigon[16423]: [INFO] [02-21|19:03:26.435] RPC Daemon notified of new headers       from=19279473 to=19279474 hash=0xb5aa8822e5f62e54e2d9b110fe360e358711ead6346b05504c08d521ce4a19b9 header sending=15.913µs log sending=257ns
Feb 21 19:03:26 erigon[16423]: [INFO] [02-21|19:03:26.436] head updated                             hash=0xb5aa8822e5f62e54e2d9b110fe360e358711ead6346b05504c08d521ce4a19b9 number=19279474
Feb 21 19:03:26 erigon[16423]: [INFO] [02-21|19:03:26.485] [snapshots] Prune Blocks                 to=19150000 limit=100
Feb 21 19:03:26 erigon[16423]: [WARN] [02-21|19:03:26.486] [snapshots] retire blocks                err="DumpBlocks: DumpBodies: header missed in db: block_num=19150000,  hash=9c87e585613f7ea1ebd89ec92a7bdc7e2ceaa5f8843c4237aaa8b44030532f19"
Feb 21 19:03:37 erigon[16423]: [INFO] [02-21|19:03:37.591] [NewPayload] Handling new payload        height=19279475 hash=0xf9df32dda0808cdbd94652bd036887463d3d1d6c2b41540a92b01d9b1694c94b
Feb 21 19:03:41 erigon[16423]: [INFO] [02-21|19:03:41.866] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload)
Feb 21 19:03:42 erigon[16423]: [INFO] [02-21|19:03:42.201] RPC Daemon notified of new headers       from=19279474 to=19279475 hash=0xf9df32dda0808cdbd94652bd036887463d3d1d6c2b41540a92b01d9b1694c94b header sending=16.059µs log sending=493ns
Feb 21 19:03:42 erigon[16423]: [INFO] [02-21|19:03:42.202] head updated                             hash=0xf9df32dda0808cdbd94652bd036887463d3d1d6c2b41540a92b01d9b1694c94b number=19279475
Feb 21 19:03:42 erigon[16423]: [INFO] [02-21|19:03:42.332] [snapshots] Prune Blocks                 to=19150000 limit=100
Feb 21 19:03:42 erigon[16423]: [WARN] [02-21|19:03:42.333] [snapshots] retire blocks                err="DumpBlocks: DumpBodies: header missed in db: block_num=19150000,  hash=9c87e585613f7ea1ebd89ec92a7bdc7e2ceaa5f8843c4237aaa8b44030532f19"
Feb 21 19:03:48 erigon[16423]: [INFO] [02-21|19:03:48.996] [NewPayload] Handling new payload        height=19279476 hash=0xe88d3373a340cbb6c951f1f9d75441fc136aa7d61429854d69871607146e4f98
Feb 21 19:03:50 erigon[16423]: [INFO] [02-21|19:03:50.574] [updateForkchoice] Fork choice update: flushing in-memory state (built by previous newPayload)
Feb 21 19:03:50 erigon[16423]: [INFO] [02-21|19:03:50.806] RPC Daemon notified of new headers       from=19279475 to=19279476 hash=0xe88d3373a340cbb6c951f1f9d75441fc136aa7d61429854d69871607146e4f98 header sending=16.048µs log sending=211ns
Feb 21 19:03:50 erigon[16423]: [INFO] [02-21|19:03:50.806] head updated                             hash=0xe88d3373a340cbb6c951f1f9d75441fc136aa7d61429854d69871607146e4f98 number=19279476
Feb 21 19:03:50 erigon[16423]: [INFO] [02-21|19:03:50.884] [snapshots] Prune Blocks                 to=19150000 limit=100
Feb 21 19:03:50 erigon[16423]: [WARN] [02-21|19:03:50.885] [snapshots] retire blocks                err="DumpBlocks: DumpBodies: header missed in db: block_num=19150000,  hash=9c87e585613f7ea1ebd89ec92a7bdc7e2ceaa5f8843c4237aaa8b44030532f19"
Feb 21 19:04:01 erigon[16423]: [INFO] [02-21|19:04:01.428] [NewPayload] Handling new payload        height=19279477 hash=0x8469b4deb0bf718ceae415fcc9557ab14cf744ef98c68d383721adda7cd02593

System information

Erigon version: erigon version 2.58.0-9731f716

Seemingly related

https://github.com/ledgerwatch/erigon/issues/6636

AskAlexSharov commented 6 months ago

temporary fix: --snap.stop permanent fix: integration stage_header --reset

tmeinlschmidt commented 6 months ago

interesting is, we have two nodes and this affects only one one of them. Both the same data, the same version. Weird.

ibrahimtan93 commented 6 months ago

same problem in my node also i have "can't build missed indices" error.

integration stage_header --reset doesn't fix problem.

Erigon Version: 2.58.1

Logs:

INFO[02-23|00:04:40.528] [1/12 Snapshots] Requesting downloads
INFO[02-23|00:04:42.225] [snapshots:download] Blocks Stat         blocks=19150k indices=19150k alloc=4.7GB sys=5.6GB
INFO[02-23|00:04:42.225] [snapshots:missed-idx] Blocks Stat       blocks=19150k indices=19150k alloc=4.7GB sys=5.6GB
EROR[02-23|00:04:42.782] Could not start execution service        err="[1/12 Snapshots] can't build missed indices: TransactionsIdx: at=19150000-19160000, pre index building, expect: 1494586, got 7297745"
INFO[02-23|00:05:03.608] [Sentinel] Sentinel started              app=caplin enr=enr:-Ly4QEczJkMibOYHW1rG2JP6WMCQT2l4kUoogbB55cIOseQSB3kgQ6ZSQpaMO_j8-yAR5qgAjaaUmQGt4HT0rcA6JIMBh2F0dG5ldHOIAAAAAAAAAACEZXRoMpC7pNqWBAAAAAAAAAAABB0AgmlkgnY0gmlwhH8AAAGJc2VjcDI1NmsxoQNaDSrBk_V6BSTZn7J1_ZV8-pzmeV-33HgUthW0umdJtohzeW5jbmV0cwCDdGNwgg-hg3VkcIIPoA
INFO[02-23|00:05:03.608] Static peers                             len=0
INFO[02-23|00:05:03.608] Started Ethereum 2.0 Gossip Service      app=caplin
INFO[02-23|00:05:28.806] [Caplin] starting clstages loop          app=caplin
INFO[02-23|00:05:28.806] Starting downloading History             app=caplin stage=DownloadHistoricalBlocks from=8485120
INFO[02-23|00:05:30.033] Backfilling finished
INFO[02-23|00:05:33.811] Ready to insert history, waiting for sync cycle to finish app=caplin stage=DownloadHistoricalBlocks
INFO[02-23|00:05:33.815] [Caplin] Forward Sync                    app=caplin stage=ForwardSync from=8485120 to=8485225
INFO[02-23|00:05:48.839] Adding light client update               period=1035
INFO[02-23|00:06:03.608] P2P                                      app=caplin peers=30
INFO[02-23|00:06:06.055] [4/12 Execution] Completed on            block=19286611
INFO[02-23|00:06:18.251] RPC Daemon notified of new headers       from=19286596 to=19286611 hash=0xccda8961b36855dc8354cbdfa796bde43a9664a3f46f71facac2a6a51a6a019e header sending=32.13µs log sending=220ns
INFO[02-23|00:06:18.251] head updated                             hash=0xccda8961b36855dc8354cbdfa796bde43a9664a3f46f71facac2a6a51a6a019e number=19286611
INFO[02-23|00:06:18.251] [snapshots] Prune Blocks                 to=19150000 limit=100
WARN[02-23|00:06:18.252] [snapshots] retire blocks                err="DumpBlocks: DumpBodies: header missed in db: block_num=19150000,  hash=9c87e585613f7ea1ebd89ec92a7bdc7e2ceaa5f8843c4237aaa8b44030532f19"
INFO[02-23|00:06:43.830] [4/12 Execution] Completed on            block=19286614
INFO[02-23|00:06:46.528] RPC Daemon notified of new headers       from=19286611 to=19286614 hash=0xcd98e735fce6d47e68c96986c67e01968fefac9815d3b129c04fd19a14d4f7ae header sending=38.012µs log sending=221ns
INFO[02-23|00:06:46.528] head updated                             hash=0xcd98e735fce6d47e68c96986c67e01968fefac9815d3b129c04fd19a14d4f7ae number=19286614
INFO[02-23|00:06:46.529] [snapshots] Prune Blocks                 to=19150000 limit=100
WARN[02-23|00:06:46.530] [snapshots] retire blocks                err="DumpBlocks: DumpBodies: header missed in db: block_num=19150000,  hash=9c87e585613f7ea1ebd89ec92a7bdc7e2ceaa5f8843c4237aaa8b44030532f19"
INFO[02-23|00:07:03.609] P2P                                      app=caplin peers=14
INFO[02-23|00:07:25.947] [p2p] GoodPeers                          eth68=3 eth66=2 eth67=6
INFO[02-23|00:07:36.540] [txpool] stat                            pending=729 baseFee=506 queued=2631 alloc=10.1GB sys=12.3GB
INFO[02-23|00:08:03.614] P2P                                      app=caplin peers=17
INFO[02-23|00:09:03.608] P2P                                      app=caplin peers=22
INFO[02-23|00:09:10.881] [4/12 Execution] Completed on            block=19286626
ryanschneider commented 6 months ago

I've noticed we get this issue pretty consistently on Goerli when stopping, starting a node. For example on stopping we see this WARN:

Feb 24 05:19:28 ip-10-0-65-9 erigon[9174]: [WARN] [02-24|05:19:28.340] [snapshots] retire blocks                err="DumpBlocks: compress: context canceled"

If we see that warning, then the next startup we always see:

Feb 24 05:38:21 ip-10-0-65-9 erigon[57376]: [WARN] [02-24|05:38:21.672] [1/12 Snapshots] Some blocks are not in snapshots and not in db max_in_snapshots=8499999 min_in_db=10446000
Feb 24 05:38:21 ip-10-0-65-9 erigon[57376]: [INFO] [02-24|05:38:21.672] [snapshots:download] Blocks Stat         blocks=8500k indices=8500k alloc=1.1GB sys=1.7GB
Feb 24 05:38:21 ip-10-0-65-9 erigon[57376]: [INFO] [02-24|05:38:21.678] [snapshots] Prune Blocks                 to=8500000 limit=100
Feb 24 05:38:21 ip-10-0-65-9 erigon[57376]: [WARN] [02-24|05:38:21.679] [snapshots] retire blocks                err="DumpBlocks: DumpBodies: header missed in db: block_num=8500000>
JamesCropcho commented 6 months ago

Permanent fix: integration stage_header --reset

From https://github.com/ledgerwatch/erigon/issues/9479#issuecomment-1958530855

This indeed worked for me. It took way longer than I expected—about 5 days, on a non-networked (local) NVMe SSD—but it does appear to have succeeded.

AskAlexSharov commented 6 months ago

found 1 reason of it https://github.com/ledgerwatch/erigon/pull/9555 If there is a gap in snapshots (for example if file manually deleted): erigon will ignore all snapshots after gap and will try to create snapshots starting from block "before gap" and of-course doesn't have such blocks (erigon already created older snapshots in the past and removed such block). So, there is no bug in prune/retirement code of blocks, but just gap in snap files. I will print it to debug logs now and maybe we will do something else about it in future.

mariokami commented 6 months ago

I had the same issue.

Using the -snap.stop temporary fix seemed to work.

Thank you @AskAlexSharov for your suggestions and also a great project 🙌