awslabs / python-deequ

Python API for Deequ
Apache License 2.0
669 stars 131 forks source link

PyDeequ to support Apache Spark version > 3.4.0 #151

Closed KamaelIce closed 3 months ago

KamaelIce commented 10 months ago

Is your feature request related to a problem? Please describe. Currently our organization developed the code using PyDeequ library widely in the environment along with the Databrick which is using Apache Spark 3.3.0. There are 2 vulnerabilities [CVE-2022-31777 (Nov 2022) and CVE 2023-22946 (Apr 2023)] was discovered in Apache Spark version < 3.4.0. However, when we necessary to upgrade Apache spark to version 3.4.0, the PyDeeQu still not support this version at the moment.

Describe the solution you'd like Proposed the PyDeequ to support Apache Spark version 3.4.0 and above.

chenliu0831 commented 10 months ago

Once Deequ upgrade to 3.4, we will follow-up in PyDeequ as well shortly. PR is available https://github.com/awslabs/deequ/pull/505

fmathias commented 5 months ago

Hello, @chenliu0831! I noticed that the pull request awslabs/deequ#505 has been merged. Could you provide an estimative for when we can expect PyDeequ to support this updated version of Spark?

LucasSchelkes-BA commented 4 months ago

Same here, Spark >= 3.5. is needed for our Use Case... deequ is also up-to-date with Spark 3.5. already (https://github.com/awslabs/deequ/pull/514)

chenliu0831 commented 3 months ago

Let's move the discussion here https://github.com/awslabs/python-deequ/issues/192. Closing.