cerndb / dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
http://joerihermans.com/work/distributed-keras/
GNU General Public License v3.0
623 stars 169 forks source link

Model runs in all GPUs #52

Open bachandr opened 6 years ago

bachandr commented 6 years ago

I have user partitioned data (for now 3 users), and want to train a model for every partitioned data.

I used dist-keras and using local[*] spark mode with 3 executors (8g) and each with 1 cores i.e. 1 executor for 1 user. When the script is triggered i see the model runs on all GPUs. Has anyone experienced the similar issue, I can provide more information if asked.

Version keras - 2.1.3 tensorflow - 1.4.0-rc0 spark - 2.2.1

[1] Tesla K80 | 53'C, 0 % | 11439 / 11439 MB | br(10856M) br(208M) br(285M) br(60M) [2] Tesla K80 | 49'C, 0 % | 11439 / 11439 MB | br(10856M) br(208M) br(285M) br(60M) [3] Tesla K80 | 55'C, 0 % | 11439 / 11439 MB | br(10856M) br(208M) br(285M) br(60M) [4] Tesla K80 | 42'C, 0 % | 11439 / 11439 MB | br(10854M) br(210M) br(285M) br(60M) [5] Tesla K80 | 49'C, 0 % | 11439 / 11439 MB | br(10854M) br(210M) br(285M) br(60M) [6] Tesla K80 | 37'C, 0 % | 11439 / 11439 MB | br(10854M) br(210M) br(285M) br(60M) [7] Tesla K80 | 45'C, 0 % | 11439 / 11439 MB | br(10852M) br(212M) br(285M) br(60M)

JoeriHermans commented 6 years ago

How many partitions does your DataFrame or RDD consist of?