GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
396 stars 112 forks source link

Dry Run option for executing DVT as a cloud run job #1237

Open manojredward opened 3 weeks ago

manojredward commented 3 weeks ago

Hi Team,

We are running the DVT as cloud run job and the connections and input yaml files are stored in GCS bucket and we are referring the connection home from PSO_DV_CONN_HOME. We explored there is an option for using --dry-run in command line and the output gives the underlined executed query. When we try to use the dry-run flag in out input yaml file, we are unable to get the results.

Please suggest the way to pass the --dry-run flag in yaml format and execute it as a cloud run job.

manojredward commented 3 weeks ago

And also Please guide us on How to persist the generated output in a JSON format?

nehanene15 commented 3 weeks ago

When you generate the YAML command, you can add the JSON format like so: data-validation validate column ... --format json -c config.yaml. Now when you run the YAML, the output will be in JSON format.

For dry-run, you can do the following: data-validation configs run --dry-run -c config.yaml.

kudaravalligopi commented 2 weeks ago

Is there a way to persist the SQL query JSON output from dryrun as a file (like GCS file path), particulary when we run this as Cloud Run Job?

helensilva14 commented 2 weeks ago

Is there a way to persist the SQL query JSON output from dryrun as a file (like GCS file path), particulary when we run this as Cloud Run Job?

Hi @kudaravalligopi! This is more a matter of Cloud Run/Python implementation code than related to DVT itself. I found this post that might be helpful, please take a look: https://stackoverflow.com/questions/59799941/writing-a-new-file-to-a-google-cloud-storage-bucket-from-a-google-cloud-function