GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
404 stars 117 forks source link

Row validations frequently fail with either "source/target_column_name" as "null" #1260

Open satva opened 1 month ago

satva commented 1 month ago

Tool was running fine with no issues but in O/P I see "target_column_name": null & "target_agg_value": null . I was trying to compare rows for given primary keys & following below syntax.

data-validation -v -ll DEBUG validate row -sc BQ_CONN -tc BQ_CONN -tbls pa-dev.bgp_status=pa-dev.bgp_status_v2 --filters 'event_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 10 MINUTE)' --format json --primary-keys tenant_id,cloud_instance_id --comparison-fields cdl_region,instance_name,node_type -rr

Please help me understand what I am missing here !?

satva commented 1 month ago

Hi , Request to unblock me w.r.t this !?

satva commented 1 month ago

Hi ,

This is how trying to trigger rows validation between two tables. It was frequently failing with either source/target column names as null . As one of them was null , that results in validation_status "fail" . If agg_value was "null" , that might be BQ issue. But in this case not sure why "source_column_name" was NULL !?

data-validation -v -ll DEBUG validate row -sc BQ_CONN -tc BQ_TCONN --tables-list insight_dataset.ibgp=insight_dataset.ibgp_v2 --filters="(event_time) >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 10 MINUTE)" -pk sub_tenant_id,instance_id,bgp_peer_name -comp-fields peer_status --format json --filter-status fail

Please let me know what can be the possible issue/corresponding solution here ?

Sample O/P as shown below .

"361186":{"validation_name":"peer_status","validation_type":"Row","aggregation_type":null,"source_table_name":"insight_dataset.ibgp","source_column_name":null,"source_agg_value":null,"target_table_name":"insight_dataset.ibgp_v2","target_column_name":"peer_status","target_agg_value":"Established","group_by_columns":"{\"sub_tenant_id\": \"17563\", \"tenant_id\": \"303363\", \"bgp_peer_name\": \"IBGP-28754\", \"cloud_instance_id\": \"05e132625b5\"}","primary_keys":null,"num_random_rows":null,"difference":null,"pct_difference":null,"pct_threshold":null,"validation_status":"fail","run_id":"2843dcf9-9e97-417f-a256-1799e1d1f5ed","labels":[],"start_time":1726775369286,"end_time":1726775373373},

helensilva14 commented 1 month ago

Hi @satva, firstly could you please share a DDL (CREATE TABLE statement) for one of your BQ tables so we can try to replicate your scenario? Thanks!

Other observations: can you verify if there really are equivalent rows between source and target tables considering the filter you're using? Besides that, I don't see right now something that would be wrong with the DVT command itself.