algorand / conduit

Algorand's data pipeline framework.
MIT License
37 stars 26 forks source link

algod importer: Update sync on WaitForBlock error. #122

Closed winder closed 1 year ago

winder commented 1 year ago

Summary

If algod is restarted after it receives a sync round update but before it fetches the new round(s), then the algod follower and conduit will stall. Conduit will keep waiting for algod to reach the new sync round but it never happens.

This change adds some extra logic to the WaitForBlock call. If there is a timeout or a bad response, a new attempt to set the sync round is made.

This PR also removes the retry loop from the algod importer. Retry is now managed by the pipeline.

Test Plan

Update existing unit tests.

codecov[bot] commented 1 year ago

Codecov Report

Merging #122 (042ec0b) into master (442791a) will increase coverage by 2.71%. The diff coverage is 77.26%.

@@            Coverage Diff             @@
##           master     #122      +/-   ##
==========================================
+ Coverage   67.66%   70.37%   +2.71%     
==========================================
  Files          32       36       +4     
  Lines        1976     2535     +559     
==========================================
+ Hits         1337     1784     +447     
- Misses        570      654      +84     
- Partials       69       97      +28     
Impacted Files Coverage Δ
conduit/data/block_export_data.go 100.00% <ø> (+92.30%) :arrow_up:
conduit/metrics/metrics.go 100.00% <ø> (ø)
conduit/pipeline/metadata.go 69.11% <ø> (ø)
...duit/plugins/exporters/filewriter/file_exporter.go 81.63% <ø> (-1.06%) :arrow_down:
conduit/plugins/exporters/postgresql/util/prune.go 78.43% <ø> (ø)
conduit/plugins/importers/algod/metrics.go 100.00% <ø> (ø)
...ins/processors/filterprocessor/filter_processor.go 83.82% <ø> (+3.54%) :arrow_up:
...plugins/processors/filterprocessor/gen/generate.go 34.28% <ø> (ø)
conduit/plugins/processors/noop/noop_processor.go 64.70% <ø> (+6.81%) :arrow_up:
pkg/cli/internal/list/list.go 20.75% <ø> (ø)
... and 15 more

... and 1 file with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

winder commented 1 year ago

Looks correct to me. I'm not sure I follow how this causes the pipeline to hang though.

Last I checked if you stop/start the node it will have the last MaxAcctLookback deltas in cache (and even more rounds available). And it will also run ahead MaxAcctLookback-1 rounds.

So unless that number is 1, the node/pipeline should make progress despite the sync round being 1 round lower than what we expect. And the pipeline would correctly update the sync round once it processed another round.

I don't totally understand it either. I'm guessing there is some sort of cooldown / warmup time when rounds are being processed very quickly. For the file processor each round is being processed in the 50-200µs range.

I was able to confirm that it's the case that the sync round needs to be called (this is with MaxAcctLookback = 64):

cat metadata.json
{"genesis-hash":"mFgazF+2uRS1tMiL9dsj01hJGySEmPN28B/TjjvpVW0=","network":"betanet","next-round":609262}

curl -XGET "localhost:4190/v2/ledger/sync?pretty" -H "Authorization: Bearer aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
{
  "round": 609198
}
tzaffi commented 1 year ago

This PR also removes the retry loop from the algod importer. Retry is now managed by the pipeline.

👍