cerndb / dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
http://joerihermans.com/work/distributed-keras/
GNU General Public License v3.0
623 stars 169 forks source link

dist-keras not utilizing GPUs on Spark workers when running example notebook #47

Closed smurching closed 6 years ago

smurching commented 6 years ago

Hi, I'm trying to run the workflow.ipynb example notebook on a GPU-enabled cluster (3 single-GPU AWS p2.xlarge instances). The example runs fine, but when I run nvidia-smi on my machines I don't see any GPU utilization (no memory usage, no running processes).

Including the relevant parts of my pip freeze output below:

dist-keras==0.2.1
Keras==2.1.2
tensorflow==1.4.0
tensorflow-gpu==1.4.0
tensorflow-tensorboard==0.4.0rc3
Theano==1.0.0

My machines are running Ubuntu 16.04 and Spark 2.2.

The discussion in #10 seemed to imply that dist-keras utilizes GPUs; has anybody seen similar behavior or know if there's special config settings I need to specify for GPU utilization?

Let me know if there's any other information I can provide that'd help :)