cloudquery / plugin-sdk

CloudQuery Go SDK for source and destination plugins
Mozilla Public License 2.0
22 stars 24 forks source link

feat: Add basic testing large syncs support #1862

Closed erezrokah closed 3 months ago

erezrokah commented 3 months ago

Summary

A very crude and simple way to do https://github.com/cloudquery/cloudquery-issues/issues/1846. Adds a new hidden fuzz test scheduler that only multiplies the clients (at the moment). The code is based on the shuffle scheduler then adds then duplicates client based on the multiplier.

See example:

cloudquery sync examples/pagerduty-postgres.yml

Loading spec(s) from examples/pagerduty-postgres.yml
Starting sync for: pagerduty (local@/Users/erezrokah/code/github/cloudquery/cloudquery-private/plugins/source/pagerduty/pagerduty) -> [postgresql (cloudquery/postgresql@v8.2.7)]
Sync completed successfully. Resources: 525, Errors: 0, Warnings: 0, Time: 7s
CQ_DEBUG_SYNC_MULTIPLIER=50 cloudquery sync examples/pagerduty-postgres.yml

Loading spec(s) from examples/pagerduty-postgres.yml
Starting sync for: pagerduty (local@/Users/erezrokah/code/github/cloudquery/cloudquery-private/plugins/source/pagerduty/pagerduty) -> [postgresql (cloudquery/postgresql@v8.2.7)]
Sync completed successfully. Resources: 26385, Errors: 0, Warnings: 0, Time: 2m12s

This has a couple of downsides/tradeoffs

  1. There will be clients with duplicate IDs which breaks the metrics counts https://github.com/cloudquery/plugin-sdk/blob/25ed3d25a529a22f351ab92e22fb03a19c9557d4/scheduler/metrics.go#L144
  2. If a plugin uses the client ID to ensure uniqueness for state client keys, that logic will break too
  3. If a table doesn't have any resources the impact of the multiplier will be lower

However I think this is still useful if we want to artificially make a sync large (e.g. simulate a sync on many AWS accounts)


Use the following steps to ensure your PR is ready to be reviewed

erezrokah commented 3 months ago

I don't think scheduler_fuzz is a good name for this. I did the exercise of looking at the code without looking at the issue first, and it made no sense to me what the point of this scheduler was πŸ€”

It's more of a load test scheduler πŸ€” It doesn't do fuzzing.

πŸ’― Happy to rename it. I don't like the fuzz name as well (though it might do some fuzzing in the future). I'll rename

erezrokah commented 3 months ago

πŸ’― Happy to rename it. I don't like the fuzz name as well (though it might do some fuzzing in the future). I'll rename

Done in https://github.com/cloudquery/plugin-sdk/pull/1862/commits/08bb4a04ba738cd8c866b2fa59d60be9b53056e7