algorand / conduit

Algorand's data pipeline framework.
MIT License
37 stars 26 forks source link

Pipeline the Pipeline #128

Closed tzaffi closed 1 year ago

tzaffi commented 1 year ago

Description

Allowing for moderate concurrency in the pipeline but without sacrificing its sequential integrity.

Summary of Changes

Issues

118

TODO

Testing

E2E

pipeline_bench_test.go

Running a new benchmark test twice on the original code and the new, we have the following results. Note the most pertinent results for the typical indexer DB population use case is exporter_10ms_while_others_1ms:

Benchmark Name Original rounds/sec Pipelining rounds/sec Pipelining v Original (%)
vanilla_2_procs_without_sleep-size-1-8 3077 3309.5 +7%
uniform_sleep_of_10ms-size-1-8 22.32 79.815 +250%
exporter_10ms_while_others_1ms-size-1-8 63.405 78.565 +24%
importer_10ms_while_others_1ms-size-1-8 65.535 91.255 +39%
first_processor_10ms_while_others_1ms-size-1-8 60.28 89.175 +48%

Block Generator Results

Running the block generator test using SCENARIO = scenarios/config.allmixed.small.yml for 30s, with the original code and the new, each time for 2 experiments we have:

Reset database? Original rounds/30 sec Pipelining rounds/30 sec Pipelining v Original (%)
Reset 301 400 +33%
No Reset 295 418 +41%

Local test network 5 minute sprint

I used the Justfile command

❯ just conduit-bootstrap-and-go 300

to bootstrap testnet and run a postgresql exporter against it for 300 seconds. I ran it a number of times against both the original pipeline and the new one. Here are the experimental results:

Log Level Reps Original rounds/300 sec (logs/round) Pipelining rounds/300 sec (logs/round) Pipelining v Original (%)
TRACE 3 3718 (7.0) 3509 (14.0) -5.6% 😢
INFO 2 4578.5 (3.0) 4423.5 (3.0) -3.4% 😢

On EC2 - CLASSIC vs. PIPELINING vs. 30 Second Timeout vs. FINAL

I ran catchup tests for 4 versions of conduit:

There are much more detailed results in a google sheets document, but the summary is:

SUMMARY

image
codecov[bot] commented 1 year ago

Codecov Report

Merging #128 (9918073) into master (442791a) will increase coverage by 4.32%. Report is 52 commits behind head on master. The diff coverage is 81.89%.

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   67.66%   71.98%   +4.32%     
==========================================
  Files          32       36       +4     
  Lines        1976     2695     +719     
==========================================
+ Hits         1337     1940     +603     
- Misses        570      657      +87     
- Partials       69       98      +29     
Files Changed Coverage Δ
conduit/data/block_export_data.go 100.00% <ø> (+92.30%) :arrow_up:
conduit/metrics/metrics.go 100.00% <ø> (ø)
conduit/pipeline/metadata.go 69.11% <ø> (ø)
conduit/plugins/config.go 100.00% <ø> (ø)
...duit/plugins/exporters/filewriter/file_exporter.go 81.63% <ø> (-1.06%) :arrow_down:
conduit/plugins/importers/algod/metrics.go 100.00% <ø> (ø)
...gins/processors/filterprocessor/fields/searcher.go 77.50% <ø> (ø)
...ins/processors/filterprocessor/filter_processor.go 83.82% <ø> (+3.54%) :arrow_up:
...plugins/processors/filterprocessor/gen/generate.go 34.28% <ø> (ø)
conduit/plugins/processors/noop/noop_processor.go 64.70% <ø> (+6.81%) :arrow_up:
... and 20 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Eric-Warehime commented 1 year ago

If it's easy for you to run, it might be interesting to see Local test network 5 minute sprint using different log levels to show how logrus/logging in general is impacting processing times.

tzaffi commented 1 year ago

If it's easy for you to run, it might be interesting to see Local test network 5 minute sprint using different log levels to show how logrus/logging in general is impacting processing times.

Can we add this as a task for #131 ?