feat: user defined input `run_id` via config

GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match

Apache License 2.0

407 stars 119 forks source link

feat: user defined input `run_id` via config #1310

Closed nick-redfearn closed 2 weeks ago

nick-redfearn commented 2 weeks ago

I would like to be able to define run_ids before I start individual row validations, so that I can better keep track of individual runs.

Before this change = The run_id is generated as part of RunMetadata object using str(uuid.uuid4())

After this change = The run_id can be input via config, in which case it is assigned as part of data_validation.py, otherwise if the input config value is None then data_validation.py will continue to use str(uuid.uuid4())

To reviewers... Please note that the majority of my changes are to test_data_validation.py where Ive added extra assertions and cleaned up things up by using references to consts values rather than strings.

nj1973 commented 2 weeks ago

/gcbrun

nick-redfearn commented 2 weeks ago

@nj1973 - Would you mind explaining your 2 commits for my understanding please?

nj1973 commented 2 weeks ago

/gcbrun

nick-redfearn commented 2 weeks ago

@nj1973 Im considering adding this to cli_tools.py, under _add_common_arguments

nj1973 commented 2 weeks ago

@nj1973 - Would you mind explaining your 2 commits for my understanding please?

55f5e64

6857a27

Hi @nick-redfearn, happy to. In the BigQuery integration tests there are some tests which store validations in a YAML config file. They include an assertion that the number of lines in the YAML file match a constant. With the addition of the new run_id attribute the constants needed increasing by 1.

I'm not convinced that checking the number of lines in the config file is the best way for these tests to identify success but I don't want to dig into that right now.

You are unlikely to be able to run integration tests due to needing many different database types therefore you won't see knock on effects like this. I figured the simplest thing to do was to adjust the constants and update you once the tests were passing.

I'll review the PR properly later today.

nick-redfearn commented 2 weeks ago

@nj1973 Thanks very much 🙇

nj1973 commented 2 weeks ago

@nj1973 Im considering adding this to cli_tools.py, under _add_common_arguments

Yep, I think that's a good idea.

nick-redfearn commented 2 weeks ago

Yep, I think that's a good idea.

@nj1973 Added in this commit

nj1973 commented 2 weeks ago

Yep, I think that's a good idea.

@nj1973 Added in this commit

Hi, the value from the command line option is not being passed through. You need to add some code to get_pre_build_configs() to pass the argument value through to the config manager. And perhaps also add a test to test_config_manager.py (similar to test_get_threshold_property).

nick-redfearn commented 2 weeks ago

@nj1973 Ive added these commits...

Cleanup in cli_tools.py to use consts values -> https://github.com/GoogleCloudPlatform/professional-services-data-validator/pull/1310/commits/6908c7ee70d16eba956126a45611dc1772a5b694
Adding to get_pre_build_configs -> https://github.com/GoogleCloudPlatform/professional-services-data-validator/pull/1310/commits/6908c7ee70d16eba956126a45611dc1772a5b694
Adding to test_config_manager.py, similar to test_get_threshold_property + formatting in get_pre_build_configs -> https://github.com/GoogleCloudPlatform/professional-services-data-validator/pull/1310/commits/b6b08d4f940d809bd8963b8cd4d4dc32021f7050

nj1973 commented 2 weeks ago

/gcbrun

nj1973 commented 2 weeks ago

Thanks @nick-redfearn for your continued contributions, it's very much appreciated.

nick-redfearn commented 2 weeks ago

@nj1973 Thanks a lot for your help, much appreciated 🙇