GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
407 stars 119 forks source link

comp-fields exception when all columns in primary key #1309

Open nj1973 opened 2 weeks ago

nj1973 commented 2 weeks ago

Test case:

CREATE TABLE dvt_test.tab_all_in_pk
(
  id1 VARCHAR2(20) NOT NULL,
  id2 VARCHAR2(20) NOT NULL
);

ALTER TABLE dvt_test.tab_all_in_pk ADD CONSTRAINT tab_all_in_pk_pk PRIMARY KEY (id1,id2);

data-validation validate row --source-conn ora_local --target-conn ora_local \
--tables-list "dvt_test.tab_all_in_pk" --primary-keys id1,id2 --comparison-fields="*"
...
    differences_pivot = functools.reduce(
                        ^^^^^^^^^^^^^^^^^
TypeError: reduce() of empty iterable with no initial value
sundar-mudupalli-work commented 1 week ago

Hi,

I believe there is a user misunderstanding here - though DVT's error message could be better. Our documentation says --comparison-fields flag specifies the values (e.g. columns) whose raw values will be compared based on the primary key join. So if the user specifies no columns beyond the primary key columns, then the comparison is not possible - since there are no columns.

DVT can detect this condition and throw an error message and return a non zero exit status - rather than printing the confusing reduce() of empty iterable message.

Sundar Mudupalli

nj1973 commented 1 week ago

I still think we should be able to complete the validation, or are we just saying they need to use concat or hash instead?

I did have a look at this myself today and my suggestion of dealing with it in the combiner is really difficult. I haven't often looked at that code before and it's hard to understand.

However I revisited your original suggestion of adding a placeholder value to the validation. I couldn't see how to push a literal through but could see how to use one of the primary key columns to do the same job. I created a branch and pushed an exploratory commit for discussion purposes. I'm not wedded to this idea, just thought it was worth sharing/discussing.

sundar-mudupalli-work commented 5 days ago

I am not clear about the expected result. I used two simple tables

select * from mudupalli.dvt_test6;
 id | name | value1 | value2 
----+------+--------+--------
  1 | AA   |      1 |      2
  2 | AA   |    1.4 |    2.3
  3 | BB   |    1.1 |    2.1

select * from mudupalli.dvt_test7;
 id | name | value1 | value2 
----+------+--------+--------
  1 | AA   |      1 |      2
  3 | BB   |      1 |    2.1
  4 | BB   |    2.3 |      2

and then issued the DVT command data-validation validate row -sc my_postgres -tc my_postgres -tbls=mudupalli.dvt_test6=mudupalli.dvt_test7 -pk=id,name --comparison-fields=name,value1,value2 - dvt ignored the PK in the comparison fields and produced the following output. What is the desired output in this case?


╒═══════════════════╤═══════════════════╤═════════════════════╤══════════════════════╤════════════════════╤════════════════════╤══════════════════╤═════════════════════╤══════════════════════════════════════╕
│ validation_name   │ validation_type   │ source_table_name   │ source_column_name   │   source_agg_value │   target_agg_value │   pct_difference │ validation_status   │ run_id                               │
╞═══════════════════╪═══════════════════╪═════════════════════╪══════════════════════╪════════════════════╪════════════════════╪══════════════════╪═════════════════════╪══════════════════════════════════════╡
│ value2            │ Row               │ mudupalli.dvt_test6 │ value2               │                2   │                2   │                  │ success             │ 488131e2-3e84-4c66-ac81-d53a7169a1b2 │
├───────────────────┼───────────────────┼─────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ value2            │ Row               │ mudupalli.dvt_test6 │ value2               │                2.3 │              nan   │              nan │ fail                │ 488131e2-3e84-4c66-ac81-d53a7169a1b2 │
├───────────────────┼───────────────────┼─────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ value2            │ Row               │ mudupalli.dvt_test6 │ value2               │                2.1 │                2.1 │                  │ success             │ 488131e2-3e84-4c66-ac81-d53a7169a1b2 │
├───────────────────┼───────────────────┼─────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ value1            │ Row               │ mudupalli.dvt_test6 │ value1               │                1   │                1   │                  │ success             │ 488131e2-3e84-4c66-ac81-d53a7169a1b2 │
├───────────────────┼───────────────────┼─────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ value1            │ Row               │ mudupalli.dvt_test6 │ value1               │                1.4 │              nan   │              nan │ fail                │ 488131e2-3e84-4c66-ac81-d53a7169a1b2 │
├───────────────────┼───────────────────┼─────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ value1            │ Row               │ mudupalli.dvt_test6 │ value1               │                1.1 │                1   │                  │ fail                │ 488131e2-3e84-4c66-ac81-d53a7169a1b2 │
├───────────────────┼───────────────────┼─────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ value2            │ Row               │ mudupalli.dvt_test6 │ nan                  │              nan   │                2   │              nan │ fail                │ 488131e2-3e84-4c66-ac81-d53a7169a1b2 │
├───────────────────┼───────────────────┼─────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ value1            │ Row               │ mudupalli.dvt_test6 │ nan                  │              nan   │                2.3 │              nan │ fail                │ 488131e2-3e84-4c66-ac81-d53a7169a1b2 │
╘═══════════════════╧═══════════════════╧═════════════════════╧══════════════════════╧════════════════════╧════════════════════╧══════════════════╧═════════════════════╧══════════════════════════════════════╛
nj1973 commented 4 days ago

We can park this for now and I'll ask the customer what they would expect to see.

For completeness here is output for the experimental commit on the feature branch:

SQL> select * from dvt_test.tab_all_in_pk;

ID1                  ID2
-------------------- --------------------
A                    A
A                    B
A                    C

SQL> select * from dvt_test.tab_all_in_pk2;

ID1                  ID2
-------------------- --------------------
A                    A
A                    B
A                    D
A                    E

$ data-validation validate row -sc ora -tc ora --tables-list "dvt_test.tab_all_in_pk=dvt_test.tab_all_in_pk2" --primary-keys id1,id2 --comparison-fields="*"
╒══════════════════════════════════╤═══════════════════╤════════════════════════╤══════════════════════╤════════════════════╤════════════════════╤══════════════════╤═════════════════════╤══════════════════════════════════════╕
│ validation_name                  │ validation_type   │ source_table_name      │ source_column_name   │ source_agg_value   │ target_agg_value   │   pct_difference │ validation_status   │ run_id                               │
╞══════════════════════════════════╪═══════════════════╪════════════════════════╪══════════════════════╪════════════════════╪════════════════════╪══════════════════╪═════════════════════╪══════════════════════════════════════╡
│ dvt_comp_fields_first_key_column │ Row               │ dvt_test.tab_all_in_pk │ id1                  │ A                  │ A                  │                  │ success             │ bf3739df-c5f2-4e33-88b6-9fa8bb5dcf2a │
├──────────────────────────────────┼───────────────────┼────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ dvt_comp_fields_first_key_column │ Row               │ dvt_test.tab_all_in_pk │ id1                  │ A                  │ A                  │                  │ success             │ bf3739df-c5f2-4e33-88b6-9fa8bb5dcf2a │
├──────────────────────────────────┼───────────────────┼────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ dvt_comp_fields_first_key_column │ Row               │ dvt_test.tab_all_in_pk │ id1                  │ A                  │ nan                │              nan │ fail                │ bf3739df-c5f2-4e33-88b6-9fa8bb5dcf2a │
├──────────────────────────────────┼───────────────────┼────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ dvt_comp_fields_first_key_column │ Row               │ dvt_test.tab_all_in_pk │ nan                  │ nan                │ A                  │              nan │ fail                │ bf3739df-c5f2-4e33-88b6-9fa8bb5dcf2a │
├──────────────────────────────────┼───────────────────┼────────────────────────┼──────────────────────┼────────────────────┼────────────────────┼──────────────────┼─────────────────────┼──────────────────────────────────────┤
│ dvt_comp_fields_first_key_column │ Row               │ dvt_test.tab_all_in_pk │ nan                  │ nan                │ A                  │              nan │ fail                │ bf3739df-c5f2-4e33-88b6-9fa8bb5dcf2a │
╘══════════════════════════════════╧═══════════════════╧════════════════════════╧══════════════════════╧════════════════════╧════════════════════╧══════════════════╧═════════════════════╧══════════════════════════════════════╛

But it's just a bit of a hack. So better to pause than rush a hack though.