Performance benchmark for BATCH

aaronsteers commented 2 years ago

It would be awesome to create a benchmark for the Snowflake connectors, with and without BATCH, and specifically on datasets that would most benefit from BATCH as a high-throughput optimization.

Per:

https://github.com/meltano/sdk/discussions/975#discussioncomment-3765329

This creates a really nice repeatable process for anyone in the community who wants to do their own benchmarks:

Download the datasets using the links in your readme.

Install the tap and configure it with the path to the downloaded files.

Run the tap to a target directly, cat the output to a local buffer file, or load it into a database that you want to test as a tap.

Test, observe, tweak, repeat! 🎉

The datasets included are:

If think the specific data I'd love to see for this...

Q: How quickly can we sync any one of the provided sample streams from tap-snowflake to target-snowflake:

With batch messaging disabled. E.g.: tap-snowflake --config=tap-config.json | target-snowflake --config=target-config.json
With batch messaging enabled. E.g.: tap-snowflake --config=tap-config.json --config=batch-config.json | target-snowflake --config=target-config.json
With batch messaging enabled, but tap and target run in isolation:
1. tap-snowflake --config=tap-config.json > runresults.singer.jsonl
2. cat runresults.singer.jsonl | target-snowflake --config=target-config.json

aaronsteers commented 2 years ago

@kgpayne - If this turns out to be difficult, totally ok to postpone for a future iteration. It'd be worth a moderate investment but not not worth delaying the batch PRs themselves, if that's helpful.

Presumably you'd seed the exercise by running something like meltano run tap-stackoverflow-sample target-snowflake - but if you run into any problems getting sample data loaded, that could be a potential blocker/slowdown here. (No don't we'll work out any kinks over time.)

aaronsteers commented 2 years ago

Closing as resolved. At least for now, this should be sufficient: https://github.com/meltano/sdk/discussions/906#discussioncomment-3955177

MeltanoLabs / tap-snowflake

Performance benchmark for BATCH #3