brimdata / zync

Kafka connector to sync Zed lakes to and from Kafka topics
BSD 3-Clause "New" or "Revised" License
18 stars 3 forks source link

performance test rig for continuous sync #82

Open mccanne opened 2 years ago

mccanne commented 2 years ago

To test the service defined in brimdata/zync#83, this issue is to create a performance test rig using large amounts of synthetic data. We will create a large number of data across thousands of topics and run these perf tests by hand. We can ingest the test data into confluent with a script and just leave the data in the kafka cloud service available for testing whenever we need it.

This will expose some issues with the built-in queries that zync does to track and update progress between the raw and staging pools. We need to make sure the efficiency of these queries is O(work to do) and never O(all data in pool). This may require adding some optimizations to the zed lake as we continue to improve the DAG planner and optimizer.

philrz commented 2 years ago

@nwt has shared with me a set of scripts/configs that create such a stress test using a JSON plugin to Kafka Connect. I've gotten it to work successfully on my Mac laptop and have some follow-on ideas of what i'd like to do next with it. So far my ideas are: