databricks / spark-deep-learning

Deep Learning Pipelines for Apache Spark
https://databricks.github.io/spark-deep-learning
Apache License 2.0
1.99k stars 494 forks source link

Not able to import sparkdl in jupyter notebook #185

Closed yashwanthmadaka24 closed 3 years ago

yashwanthmadaka24 commented 5 years ago

Hi,

I am trying to use this library in jupyter notebook, but I am getting error "no module found".

When I am running the below command pyspark --packages databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11 I am able to import sparkdl in the spark shell.

How can I use it in jupyter notebook?

zhujiesheng commented 5 years ago

I also have this problem.

yashwanthmadaka24 commented 5 years ago

Turns out we cannot use it in jupyter notebook. We need to use Azure Databricks to use that module in jupyter notebook.

innat commented 5 years ago

We can import sparkdl in jupyter notebook. Yes, if we use pyspark --packages databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11 this then we no need to worry about some necessary deep learning pipeline packages. And we need to have a net connection though.

However, we can also make things all in local. Check this gist, hope it might help. Cheers.

skeller88 commented 4 years ago

@innat the gist results in a 404. Can you update?

spark-water commented 4 years ago

I found a solution from stack overflow (https://stackoverflow.com/questions/55377712/not-able-to-import-sparkdl-in-jupyter-notebook)

instead of using --packages, just use the .config and 'spark.jars.packages' alias within Jupyter Notebook when initiate a spark session. This would download all dependencies from databricks. You can change version to any release that suits your environment spark = (SparkSession .builder .config('spark.jars.packages', 'databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11') .getOrCreate() )

note that this is very difficult to do with local .jar because none of the releases or current github project has parent dependencies built in. I guess databricks want you to use their package, preferably even with Azure, to run sparkdl.

Hope it helps others.

shekharmayank commented 3 years ago

Same issue I'm facing. Below solution didn't work for me.

spark = (SparkSession
.builder
.config('spark.jars.packages', 'databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11')
.getOrCreate()
)
innat commented 3 years ago

@innat the gist results in a 404. Can you update?

@skeller88 Sorry for being too late, didn't encounter the mail update until now. :(

Update link, Set up of Deep-Learning-Pipelines in Linux based OS.

Please note, I've worked on this a few years ago. I also did a small project using it, Multi-Class Image Classification With Transfer Learning In PySpark. However, over this year I think the framework should update in many ways, please consider that update.