awslabs / python-deequ

Python API for Deequ
Apache License 2.0
691 stars 132 forks source link

Remove PySpark from install_requires dependency #24

Closed nfx closed 3 years ago

nfx commented 3 years ago

This pull request fixes #13 by removing install-time dependency on PySpark. If PyDeequ package is installed on a Apache Spark compatible cluster, there's 100% confidence that PySpark module path is already there. Just Deequ JAR has to be installed directly from Maven Central - https://search.maven.org/artifact/com.amazon.deequ/deequ

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

znanfelt commented 3 years ago

I believe this fixes #15 as well.

nfx commented 3 years ago

Yes, it should remove the root cause.