Nashatyrev / teku

Java Implementation of the Ethereum 2.0 Beacon Chain
Apache License 2.0
2 stars 1 forks source link

Sampling issue #135

Open Nashatyrev opened 5 days ago

Nashatyrev commented 5 days ago

Issues

1 issue: The slot couldn't be sampled for some reason 2 issue: The remote peer was banned because sent the block which we consider unavailable

Log

At some we couldn't sample block #1059. We were retrying later but never succeed

[cl-1-teku-geth] 2024-09-12 12:41:15.819 INFO  - Availability check slot 1059 result: DataAndValidationResult{validationResult=NOT_AVAILABLE, dataSize=0, cause=Optional[java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present]}
[cl-1-teku-geth] 2024-09-12 12:41:25.082 INFO  - Availability check slot 1059 result: DataAndValidationResult{validationResult=NOT_AVAILABLE, dataSize=0, cause=Optional[java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present]}
[cl-1-teku-geth] 2024-09-12 12:45:12.531 INFO  - Availability check slot 1059 result: DataAndValidationResult{validationResult=NOT_AVAILABLE, dataSize=0, cause=Optional[java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present]}
[cl-1-teku-geth] 2024-09-12 12:45:17.633 INFO  - Availability check slot 1059 result: DataAndValidationResult{validationResult=NOT_AVAILABLE, dataSize=0, cause=Optional[java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present]}

When importing blocks we decided that the batch is invalid cause it contains NOT_AVAILABLE block which is considered invalid and the peer was significantly downscored

[cl-1-teku-geth] 2024-09-12 12:45:12.532 DEBUG - Failed to import batch EventThreadOnlyBatch{delegate=SyncSourceBatch{firstSlot=1043, lastSlot=1077, count=35, complete=true, requiredParent=9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa, firstBlock=f84355460fdca170c61b2b1252f4a836f7bee466b3f9c349f3009c1be206a24d (1043) (parent: 9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa), lastBlock=a9a71628c3697312d4279df77aa21d363b43999df64f8b0f4680e289332b8a2d (1077)}}: FAILED_DATA_AVAILABILITY_CHECK_NOT_AVAILABLE
[cl-1-teku-geth] java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present
[skipped]
[cl-1-teku-geth] at tech.pegasys.teku.statetransition.datacolumns.DasSamplerBasic.lambda$checkDataAvailability$3(DasSamplerBasic.java:94) ~[teku-ethereum-statetransition-develop.jar:24.3.1+181-gc5065b695f]
[cl-1-teku-geth] ... 28 more
[cl-1-teku-geth] 2024-09-12 12:45:12.611 DEBUG - Marking batch EventThreadOnlyBatch{delegate=SyncSourceBatch{firstSlot=1043, lastSlot=1077, count=35, complete=true, requiredParent=9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa, firstBlock=f84355460fdca170c61b2b1252f4a836f7bee466b3f9c349f3009c1be206a24d (1043) (parent: 9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa), lastBlock=a9a71628c3697312d4279df77aa21d363b43999df64f8b0f4680e289332b8a2d (1077)}
    } as invalid because it extends from an invalid block
[cl-1-teku-geth] 2024-09-12 12:45:12.612 DEBUG - Penalising peer DefaultEth2Peer{id=16Uiu2HAmEuEiFQKhoZs2SrTrsqaKTyVkD6Hcuj9DgemWn5nHi9bn, remoteStatus=Optional[PeerStatus{forkDigest=0xd84e2ee0, finalizedRoot=0x52a4cbd4ff41c70b95b63e959e7677a92f26b486579cd9bd229e9abb3fa8df7f, finalizedEpoch=31, headRoot=0xa9a71628c3697312d4279df77aa21d363b43999df64f8b0f4680e289332b8a2d, headSlot=1077}]
    } for providing invalid batch data SyncSourceBatch{firstSlot=1043, lastSlot=1077, count=35, complete=true, requiredParent=9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa, firstBlock=f84355460fdca170c61b2b1252f4a836f7bee466b3f9c349f3009c1be206a24d (1043) (parent: 9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa), lastBlock=a9a71628c3697312d4279df77aa21d363b43999df64f8b0f4680e289332b8a2d (1077)}
[cl-1-teku-geth] 2024-09-12 12:45:12.612 DEBUG - Requesting blocks for 35 slots starting at 1043 from peer DefaultEth2Peer{id=16Uiu2HAmEuEiFQKhoZs2SrTrsqaKTyVkD6Hcuj9DgemWn5nHi9bn, remoteStatus=Optional[PeerStatus{forkDigest=0xd84e2ee0, finalizedRoot=0x52a4cbd4ff41c70b95b63e959e7677a92f26b486579cd9bd229e9abb3fa8df7f, finalizedEpoch=31, headRoot=0xa9a71628c3697312d4279df77aa21d363b43999df64f8b0f4680e289332b8a2d, headSlot=1077}]}
[cl-1-teku-geth] 2024-09-12 12:45:12.612 DEBUG - Sending request for 35 blocks
[cl-1-teku-geth] 2024-09-12 12:45:12.629 DEBUG - Marking 0 batches complete because block at slot 1043 is child of starting point
[cl-1-teku-geth] 2024-09-12 12:45:12.629 INFO  - Created availabilityChecker for slot 1059
[cl-1-teku-geth] 2024-09-12 12:45:12.629 INFO  - Starting data availability check for slot 1059
[cl-1-teku-geth] 2024-09-12 12:45:12.629 INFO  - Requested checkDataAvailability for 0x0b3a6ca1b6b3d348ea3460875ff15da2844e8ed2e0b5e7c32fca64816b1957ef(1059)
[cl-1-teku-geth] 2024-09-12 12:45:17.633 INFO  - Collected [empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty],[empty] DataColumnSidecars in checkDataAvailability for 0x0b3a6ca1b6b3d348ea3460875ff15da2844e8ed2e0b5e7c32fca64816b1957ef(1059)
[cl-1-teku-geth] 2024-09-12 12:45:17.633 INFO  - Availability check slot 1059 result: DataAndValidationResult{validationResult=NOT_AVAILABLE, dataSize=0, cause=Optional[java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present]}
[cl-1-teku-geth] 2024-09-12 12:45:17.633 DEBUG - sidecars validation result: DataAndValidationResult{validationResult=NOT_AVAILABLE, dataSize=0, cause=Optional[java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present]}

... and again

[cl-1-teku-geth] 2024-09-12 12:45:17.633 DEBUG - Failed to import batch EventThreadOnlyBatch{delegate=SyncSourceBatch{firstSlot=1043, lastSlot=1077, count=35, complete=true, requiredParent=9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa, firstBlock=f84355460fdca170c61b2b1252f4a836f7bee466b3f9c349f3009c1be206a24d (1043) (parent: 9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa), lastBlock=a9a71628c3697312d4279df77aa21d363b43999df64f8b0f4680e289332b8a2d (1077)}}: FAILED_DATA_AVAILABILITY_CHECK_NOT_AVAILABLE
[cl-1-teku-geth] java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present
[skipped]
[cl-1-teku-geth] Caused by: java.util.NoSuchElementException: No value present
[cl-1-teku-geth] at java.base/java.util.Optional.orElseThrow(Unknown Source) ~[?:?]
[cl-1-teku-geth] at tech.pegasys.teku.statetransition.datacolumns.DasSamplerBasic.lambda$checkDataAvailability$3(DasSamplerBasic.java:94) ~[teku-ethereum-statetransition-develop.jar:24.3.1+181-gc5065b695f]
[cl-1-teku-geth] ... 28 more

[cl-1-teku-geth] 2024-09-12 12:45:17.635 DEBUG - Marking batch EventThreadOnlyBatch
  {delegate=SyncSourceBatch{firstSlot=1043, lastSlot=1077, count=35, complete=true, requiredParent=9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa,  firstBlock=f84355460fdca170c61b2b1252f4a836f7bee466b3f9c349f3009c1be206a24d (1043) (parent: 9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa), lastBlock=a9a71628c3697312d4279df77aa21d363b43999df64f8b0f4680e289332b8a2d (1077)}
    } as invalid because it extends from an invalid block
[cl-1-teku-geth] 2024-09-12 12:45:17.635 DEBUG - Penalising peer 
   DefaultEth2Peer{id=16Uiu2HAmEuEiFQKhoZs2SrTrsqaKTyVkD6Hcuj9DgemWn5nHi9bn, remoteStatus=Optional[PeerStatus{forkDigest=0xd84e2ee0, finalizedRoot=0x52a4cbd4ff41c70b95b63e959e7677a92f26b486579cd9bd229e9abb3fa8df7f, finalizedEpoch=31, headRoot=0xa9a71628c3697312d4279df77aa21d363b43999df64f8b0f4680e289332b8a2d, headSlot=1077}]
   } for providing invalid batch data SyncSourceBatch{firstSlot=1043, lastSlot=1077, count=35, complete=true, requiredParent=9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa, firstBlock=f84355460fdca170c61b2b1252f4a836f7bee466b3f9c349f3009c1be206a24d (1043) (parent: 9c4f9e98dc2b9b2af84fb1f43607a9f04c17533e7dedf638d35b5b47d039ebfa), lastBlock=a9a71628c3697312d4279df77aa21d363b43999df64f8b0f4680e289332b8a2d (1077)}
[cl-1-teku-geth] 2024-09-12 12:45:17.635 INFO  - Disconnecting cleanly because REMOTE_FAULT from /ip4/172.16.0.13/tcp/9000

now the peer is probably banned

Nashatyrev commented 5 days ago

Sync is not capturing the latest slots because for some reasons firstIncompleteCustodySlot is not moving and is still 0 So basically if we are missing columns by gossip we never retrieve them later

[cl-1-teku-geth] 2024-09-12 12:40:47.423 INFO  - [nyota] DataColumnSidecarDB.addSidecar: new slot: 1057, prevSlot count: 128, total added: 134403, finalizedSlot: 0
[cl-1-teku-geth] 2024-09-12 12:40:56.004 INFO  - [nyota] DataCustodySync.fillUp: synced=0 pending=0, missingColumns=0([])
[cl-1-teku-geth] 2024-09-12 12:40:59.667 INFO  - [nyota] DataColumnSidecarDB.addSidecar: new slot: 1058, prevSlot count: 128, total added: 134530, finalizedSlot: 0
[cl-1-teku-geth] 2024-09-12 12:41:08.004 INFO  - [nyota] DataCustodySync.fillUp: synced=0 pending=0, missingColumns=0([])
[cl-1-teku-geth] 2024-09-12 12:41:20.004 INFO  - [nyota] DataCustodySync.fillUp: synced=0 pending=0, missingColumns=0([])
[cl-1-teku-geth] 2024-09-12 12:41:32.004 INFO  - [nyota] DataCustodySync.fillUp: synced=0 pending=0, missingColumns=0([])
[cl-1-teku-geth] 2024-09-12 12:41:38.614 INFO  - [nyota] DataColumnSidecarDB.addSidecar: new slot: 1061, prevSlot count: 128, total added: 134684, finalizedSlot: 0
[cl-1-teku-geth] 2024-09-12 12:41:44.004 INFO  - [nyota] DataCustodySync.fillUp: synced=0 pending=0, missingColumns=0([])
[cl-1-teku-geth] 2024-09-12 12:41:50.415 INFO  - [nyota] DataColumnSidecarDB.addSidecar: new slot: 1062, prevSlot count: 128, total added: 134796, finalizedSlot: 0
Nashatyrev commented 4 days ago

The peer can't sync from the very beginning due to failed sampling:

2024-09-13 17:06:45.395+04:00 | forkchoice-async-0 | DEBUG | BatchImporter | Failed to import batch EventThreadOnlyBatch{delegate=
    SyncSourceBatch{firstSlot=1, lastSlot=32, count=32, complete=true, requiredParent=c88f63b75a8780fe0f1ef720150256a5d1226ab3cf376a4586f67b449752a7b3, firstBlock=1b39ac777f6bb62dc3b200adec33b7aeb70d2435e885bf57f8dad5ab2a8b2cc2 (2) (parent: c88f63b75a8780fe0f1ef720150256a5d1226ab3cf376a4586f67b449752a7b3), lastBlock=77bf2eea9555b78a34d39b0b08e95c18c12e7d1a3665636581dea28c464a3663 (30)}}: 
    FAILED_DATA_AVAILABILITY_CHECK_NOT_AVAILABLE
    java.util.concurrent.CompletionException: java.util.NoSuchElementException: No value present

Also it assumes remote peer sending invalid data:

2024-09-13 17:06:45.396+04:00 | sync-async-0 | DEBUG | PeerScoringConflictResolutionStrategy | Penalising peer DefaultEth2Peer{id=16Uiu2HAkv2yYzHSyaWM8yNhQKtxdP18v6LQNRThRCgqiipo2bvFe, remoteStatus=Optional[PeerStatus{forkDigest=0x300c9ac3, finalizedRoot=0x77bf2eea9555b78a34d39b0b08e95c18c12e7d1a3665636581dea28c464a3663, finalizedEpoch=4, headRoot=0x883ffcf5b2981e58b0d3e8e9893f3be4dba898f813fd5e961c4b49a186e51747, headSlot=47}]
    } for providing invalid batch data SyncSourceBatch{firstSlot=1, lastSlot=32, count=32, complete=true, requiredParent=c88f63b75a8780fe0f1ef720150256a5d1226ab3cf376a4586f67b449752a7b3, firstBlock=1b39ac777f6bb62dc3b200adec33b7aeb70d2435e885bf57f8dad5ab2a8b2cc2 (2) (parent: c88f63b75a8780fe0f1ef720150256a5d1226ab3cf376a4586f67b449752a7b3), lastBlock=77bf2eea9555b78a34d39b0b08e95c18c12e7d1a3665636581dea28c464a3663 (30)}