datafold / data-diff

Compare tables within or across databases
https://docs.datafold.com
MIT License
2.95k stars 265 forks source link

where clause behaving weirdly #196

Closed akulgoel96 closed 2 years ago

akulgoel96 commented 2 years ago

Describe the bug So I am currently in the process of setting up data-diff and been facing some weird results from the where param. So my data-diff command is working perfectly fine without the where condition but facing issues while providing this condition.

So, data-diff trino://akul.goel@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://akul.goel@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 6 --bisection-threshold 100000000

this works fine, but this doesn't

data-diff trino://akul.goel@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://akul.goel@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 6 --bisection-threshold 100000000 -w "created_date = '2022-08-02'"

stack trace attached

In the above I get this error: ValueError: Error: min_key expected to be smaller than max_key!

Here's where it gets interesting though: if in the where condition I provide an earlier date, that works perfectly fine: data-diff` trino://akul.goel@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://akul.goel@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 6 --bisection-threshold 100000000 -w "created_date = '2022-08-01'"

Describe the environment

Using the master version of airbyte.

erezsh commented 2 years ago

@akulgoel96 Thank you for reporting it. However I cannot reproduce this error, and it's not clear why it happens. Can you please find out the values of min_key / max_key before the exception occurs? That might give us a clue.

Even better if at diff_tables.py, in the diff_tables() function, you could print the values of key_ranges, and then min_key1, max_key1.

Thanks!

erezsh commented 2 years ago

Closed due to inactivity.