GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
408 stars 119 forks source link

Question about generate-table-partitions for Oracle to PG validation #1341

Open helensilva14 opened 3 days ago

helensilva14 commented 3 days ago

Currently, I'm validating data between Oracle and PostgresSql, I have a table of big data of about 500M.

How to generate a file yaml to validate 10k newest data? I checked the document but I didn't find the solution to handle my case.

Please support me. Thank you so much.

Originally posted by @TienNguyenVanDev in https://github.com/GoogleCloudPlatform/professional-services-data-validator/discussions/1336

helensilva14 commented 3 days ago

Hi @TienNguyenVanDev! Here's the corresponding question issue for your case.

I think @nj1973 might be able to provide information for you.

nj1973 commented 3 days ago

Hi @TienNguyenVanDev, if you have a column you can use to identify recent data then you can use that in a filter. For example:

data-validation validate row -sc=ora -tc=pg \
-tbls=acme.big_table --primary-keys=id \
--filters="create_date > DATE'2024-11-20'" \
--hash="*" \
-c /tmp/big_table_recent.yaml

But that would rely on you knowing how many records are newer than the date value in the --filters option.

We don't have specific functionality to pick 10k records. Perhaps if your table has a numeric primary key you can take the max and subtract 10,000 and use a filter, for example:

--filters="id > 987654321" \