awslabs / python-deequ

Python API for Deequ
Apache License 2.0
676 stars 131 forks source link

Support for Spark 3.1 #75

Closed ashwin153 closed 1 year ago

ashwin153 commented 2 years ago

Deequ supports Spark 3.1 as of 2.0.0-spark-3.1.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

epilif1017a commented 2 years ago

Hi pydeequ team,

Any prediction to when this PR will be approved?

Thanks a lot for driving pydeequ forward :)

ashwin153 commented 2 years ago

@gucciwang Can this be merged?

gucciwang commented 2 years ago

Please execute pytests locally following the README dev guide and address the issues present. A quick look and I found 4 errors where 3 of them regard the constraint suggestions, and 1 was regarding pandas utils.

This thread may be relevant from other users who have tried Spark 3.1 https://github.com/awslabs/python-deequ/issues/70

bradthurber commented 2 years ago

Is it fair to say this needs to be resolved for python-deequ to work with AWS Glue 3.0?

darkcofy commented 2 years ago

Same question as @bradthurber is this merge needed to be able to use the https://mvnrepository.com/artifact/com.amazon.deequ/deequ/2.0.0-spark-3.1 jar file with glue 3.0?

mycaule commented 2 years ago

Please approve this, AWS EMR 6 is running Spark 3.1 now !

Tom-Hudson commented 2 years ago

I really would like to get going with Spark 3.1 - very frustrating that this has hung around for so long

chenliu0831 commented 1 year ago

I will close this in favor of #100 which seems more complete. Would like a few testers on Glue, EMR etc.