dvgodoy / handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes
MIT License
188 stars 24 forks source link

remove pyspark dependency #21

Open machielg opened 5 years ago

machielg commented 5 years ago

I'm trying to use this package in a production environment, pyspark is provided as I'm submitting my application using spark submit. Handyspark however requires spark to be installed, which results in a conflict in Spark versions (my cluster runs Spark 2.3 and Handyspark pulls in Spark 2.4 since that's the latest stable).

I think this is a common scenario and it would be better if Handyspark doesn't depend on pyspark but simply assumes you'll use it in an environment where pyspark is available, either installed or added to that system path using findspark.