databricks / spark-deep-learning

Deep Learning Pipelines for Apache Spark
https://databricks.github.io/spark-deep-learning
Apache License 2.0
1.99k stars 494 forks source link

Feature request - documentation on how to use this package on dataproc or kubernetes #211

Open skeller88 opened 4 years ago

skeller88 commented 4 years ago

For example: H2O.ai has documentation.

Here's what I have tried so far. I copied the requirements from this repo's environment.yml file. The cluster initialization step fails though.

gcloud dataproc clusters create spark-cluster \
--initialization-actions \
gs://dataproc-initialization-actions/python/conda-install.sh,gs://dataproc-initialization-actions/python/pip-install.sh \
--metadata 'CONDA_PACKAGES=six=1.11.0 h5py=2.8.0 pillow=4.1.1 nomkl cloudpickle=0.8.0 tensorflow=1.13.1 keras=2.2.4 paramiko=2.4.2 wrapt=1.10.11' \
--metadata 'PIP_PACKAGES=horovod==0.16.4'
--num-masters=1 \
--num-workers=2 \
--num-preemptible-workers=2 \
--optional-components=ANACONDA \
--properties='spark:spark.jars.packages=databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11' \
--region=us-west1