cloudquery / plugin-sdk

CloudQuery Go SDK for source and destination plugins
Mozilla Public License 2.0
22 stars 24 forks source link

feat: Add queue based scheduler #1869

Closed erezrokah closed 1 month ago

erezrokah commented 2 months ago

Summary

Mostly an experiment to deal with https://github.com/cloudquery/cloudquery-issues/issues/2227 as I couldn't think of a nice way to make singleNestedTableMaxConcurrency dynamic without making the code super complex.

This PR adds a scheduler that uses a worker pool pattern on top of a priority queue. This should ensure that as long as there's work to be done, all Go routines will be occupied. Also the concurrency setting is not only for the top level tables, as it's the number of workers so it's a fixed limit and simpler. The more table client pairs in the queue the less priority they'll have, this should prevent a specific table from occupying all the workers.

Opening as draft since:

  1. I'm still testing this to see the impact
  2. There's still a lot of code duplication with current code and refactoring needed to avoid it

Use the following steps to ensure your PR is ready to be reviewed

erezrokah commented 1 month ago

Closing in favor of https://github.com/cloudquery/plugin-sdk/pull/1914