GoogleCloudPlatform / professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Apache License 2.0
399 stars 114 forks source link

Random row validations fail on Oracle when sampling 1001+ ids #1157

Closed nj1973 closed 4 months ago

nj1973 commented 4 months ago

When validating with --use-random-row --random-row-batch-size=1001

The validation fails with:

sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-01795: maximum number of expressions in a list is 1000

For this specific error we can build upon the recent enhancement from https://github.com/GoogleCloudPlatform/professional-services-data-validator/issues/1146 and add a 1000 IN list limit for Oracle connections.

But we should also think about why this option was added and what sensible limits we should build in.

nj1973 commented 4 months ago

For info I've tested up to --random-row-batch-size=50000 on Oracle, BigQuery and PostgreSQL and these all complete successfully for standard data types.

nj1973 commented 4 months ago

I found and fixed a bug in the previous get_max_in_list_size() code.