awslabs / python-deequ

Python API for Deequ
Apache License 2.0
702 stars 132 forks source link

Hard dependency on the pyspark breaks Databricks runtimes #15

Closed alexott closed 3 years ago

alexott commented 3 years ago

When we're installing pydeequ on Databricks runtime, it pulls pyspark, and this breaks environment.

It would be useful to avoid hard dependency on the pyspark - instead the findspark package could be used to find installed Spark. You can look here for possible implementations