GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
385 stars 108 forks source link

Additional option to generate-table-partitions to write multiple partitions to each yaml file #1163

Open sundar-mudupalli-work opened 1 month ago

sundar-mudupalli-work commented 1 month ago

Hi,

Currently generate-table-partitions puts "one" validation (partition) in one yaml file. When we run the yaml in a cloud run container the container startup time - about 20+ seconds means that running yaml files concurrently results in a lot of wasteful container start up / shutdown. For users who want to run 1000's of partitions, this is suboptimal. Today they have to manually create a new yaml file - where each yaml file validates multiple partitions (sequentially), so the cost of container startup is spread across these validation. A recent scale validation ran over 100 paritions in each yaml file.

If there are multiple tables being validated using the same primary keys, each table validation would show as a separate validation in one yaml file.

The feature request is to add an option (default = 1) --partitions-per-yaml which specifies the number of partitions validated in one yaml file. With this feature the number of partitions can be increased fairly arbitrarily as long as users are limited to using 10,000 yaml files.

Sundar Mudupalli