GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
408 stars 119 forks source link

Request to add an option to store validation results in a file rather than on stdout #1357

Open nj1973 opened 2 days ago

nj1973 commented 2 days ago

Some customers do not have access to BigQuery or choose not to use the service. When running DVT via a service like Cloud Run they have no way to easily capture the validation results.

We could add a file based alternative to --bq-result-handler, perhaps --file-result-handler or --text-result-handler that overrides writing to stdout with a file path. The file path should support cloud storage URIs.

We should give some thought to the option value format.

Is it as simple as a path string or should we revamp how we supply the format too by accepting a JSON value? For example:

--file-result-handler='{"path": "gs://some-uri", "format": "csv", "mode": "overwrite"}'

And revamp --bq-result-handler in a similar way (perhaps deprecating --service-account while we are at it):

--bq-result-handler='{"project": "my-project", "table": "my_dataset.results_table", "service-account": None}'
nj1973 commented 1 day ago

Noting that we also have issue https://github.com/GoogleCloudPlatform/professional-services-data-validator/issues/1275 requesting ability to control the fields output to CSV. If we do go with a JSON based file or text result handler option then issue 1275 may be able to build on top of that by adding a columns attribute.

sundar-mudupalli-work commented 1 day ago

Hi,

When jobs are run with Cloud Run and Big Query is not used to capture output, the console output can be viewed using gcloud cli as follows:

gcloud beta run jobs logs read <job-name>
gcloud beta run jobs executions logs read <execution-id>

Not only you get each line output to the screen - you also get the timestamp - so you may have to remove that to have a clean output.

If we have a specific customer with this need and not able to use the gcloud command or it does not work for them, let us create an approach that works for them.

Sundar Mudupalli