Row validations frequently fail with either "source/target_column_name" as "null"

satva commented 2 months ago

Tool was running fine with no issues but in O/P I see "target_column_name": null & "target_agg_value": null . I was trying to compare rows for given primary keys & following below syntax.

data-validation -v -ll DEBUG validate row -sc BQ_CONN -tc BQ_CONN -tbls pa-dev.bgp_status=pa-dev.bgp_status_v2 --filters 'event_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 10 MINUTE)' --format json --primary-keys tenant_id,cloud_instance_id --comparison-fields cdl_region,instance_name,node_type -rr

Please help me understand what I am missing here !?

satva commented 2 months ago

Hi , Request to unblock me w.r.t this !?

satva commented 2 months ago

Hi ,

This is how trying to trigger rows validation between two tables. It was frequently failing with either source/target column names as null . As one of them was null , that results in validation_status "fail" . If agg_value was "null" , that might be BQ issue. But in this case not sure why "source_column_name" was NULL !?

data-validation -v -ll DEBUG validate row -sc BQ_CONN -tc BQ_TCONN --tables-list insight_dataset.ibgp=insight_dataset.ibgp_v2 --filters="(event_time) >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 10 MINUTE)" -pk sub_tenant_id,instance_id,bgp_peer_name -comp-fields peer_status --format json --filter-status fail

Please let me know what can be the possible issue/corresponding solution here ?

Sample O/P as shown below .

"361186":{"validation_name":"peer_status","validation_type":"Row","aggregation_type":null,"source_table_name":"insight_dataset.ibgp","source_column_name":null,"source_agg_value":null,"target_table_name":"insight_dataset.ibgp_v2","target_column_name":"peer_status","target_agg_value":"Established","group_by_columns":"{\"sub_tenant_id\": \"17563\", \"tenant_id\": \"303363\", \"bgp_peer_name\": \"IBGP-28754\", \"cloud_instance_id\": \"05e132625b5\"}","primary_keys":null,"num_random_rows":null,"difference":null,"pct_difference":null,"pct_threshold":null,"validation_status":"fail","run_id":"2843dcf9-9e97-417f-a256-1799e1d1f5ed","labels":[],"start_time":1726775369286,"end_time":1726775373373},

helensilva14 commented 2 months ago

Hi @satva, firstly could you please share a DDL (CREATE TABLE statement) for one of your BQ tables so we can try to replicate your scenario? Thanks!

Other observations: can you verify if there really are equivalent rows between source and target tables considering the filter you're using? Besides that, I don't see right now something that would be wrong with the DVT command itself.

GoogleCloudPlatform / professional-services-data-validator

Row validations frequently fail with either "source/target_column_name" as "null" #1260