databricks / spark-deep-learning

Deep Learning Pipelines for Apache Spark
https://databricks.github.io/spark-deep-learning
Apache License 2.0
1.99k stars 494 forks source link

python dependencies are not downloaded along with the spark package #218

Open skeller88 opened 4 years ago

skeller88 commented 4 years ago

When I run a spark job with this library downloaded as a package, I get an error that tensorflow is not found. I would expect that downloading this library as a package would pull in the necessary python dependencies. If that's not the case, what's the recommended way to include the necessary python dependencies?

There is a lot of discussion on approaches to handling pyspark dependencies:

This question is a more general version of my other question re: dataproc

Ben-Epstein commented 4 years ago

Can you post your stacktrace? It's possible that the spark executors don't have the dependencies, not the master. Can you also post your environment setup?

spark-water commented 4 years ago

I understand your question is regarding general dependencies. In this particular example, if you install tensorflow, the error would go away. Sparkdl is unable to find tensorflow backend, hence the error.