Closed gaurav1308 closed 1 year ago
PS: Using this as we have alphanumeric ids
pip install git+https://github.com/datafold/data-diff.git@alphanum_ids
https://github.com/datafold/data-diff/issues/59#issuecomment-1194403178
Thanks for reporting this. I can't reproduce it, so it would be helpful if you could let me know the values that are being used.
Before the line:
checkpoints = split_space(self.min_key.int, self.max_key.int, count)
If you could add -
print("$$$$$", self.min_key, self.max_key, count)
And paste here the results?
These are the values
k id -v --json --bisection-factor 10 --bisection-threshold 1000 --max-age=7d
It seems like I don't have permission on github to push above change
Permission to datafold/data-diff.git denied to gaurav1308.
These are the values
That's not what I asked..
Permission to datafold/data-diff.git denied
Yes, of course. Why would you have permissions to push to data-diff? Contributions have to come in the form of pull requests.
Params and inputs:
data-diff trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id --json --bisection-factor 10 --bisection-threshold 1000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model -v
Attaching log file error.txt
@erezsh Let me know if that helps
@gaurav1308 That's exactly what I need, thank you. Let me look into it and see if I can find the problem.
We have a new implementation for alphanumerics in master
, that I believe should fix this issue.
Sorry it took so long, but please try now and see if it helps.
Looks like this was fixed
We are evaluating data-diff for our usecase. We are facing issue when multi step iteration is being performed ie when we are reducing bisection-threshold This is working fine when bisection-threshold is high enough so that everything is done in one iteration.
data-diff trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 9 --bisection-threshold 100000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model --min-age=1d -s -w "updated_at<1659724200 and created_date<'2022-08-08'"
In second case when we reduced bisection-threshold enough so that all diffs can't be performed in one iteration
data-diff trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/sqoop_api sqoop_api.merchants trino://gaurav.singh@razorpay.com@trino-dev-coordinator-service.trino-dev.svc.cluster.local:8080/hive/realtime_hudi_api realtime_hudi_api.merchants -k id -v --json --bisection-factor 9 --bisection-threshold 1000 --max-age=7d -t created_date -c name -c email -c second_factor_auth -c restricted -c parent_id -c fee_model --min-age=1d -s -w "updated_at<1659724200 and created_date<'2022-08-08'"
getting following error