I have user partitioned data (for now 3 users), and want to train a model for every partitioned data.
I used dist-keras and using local[*] spark mode with 3 executors (8g) and each with 1 cores i.e. 1 executor for 1 user. When the script is triggered i see the model runs on all GPUs. Has anyone experienced the similar issue, I can provide more information if asked.
Version
keras - 2.1.3
tensorflow - 1.4.0-rc0
spark - 2.2.1
I have user partitioned data (for now 3 users), and want to train a model for every partitioned data.
I used dist-keras and using local[*] spark mode with 3 executors (8g) and each with 1 cores i.e. 1 executor for 1 user. When the script is triggered i see the model runs on all GPUs. Has anyone experienced the similar issue, I can provide more information if asked.
Version keras - 2.1.3 tensorflow - 1.4.0-rc0 spark - 2.2.1
[1] Tesla K80 | 53'C, 0 % | 11439 / 11439 MB | br(10856M) br(208M) br(285M) br(60M) [2] Tesla K80 | 49'C, 0 % | 11439 / 11439 MB | br(10856M) br(208M) br(285M) br(60M) [3] Tesla K80 | 55'C, 0 % | 11439 / 11439 MB | br(10856M) br(208M) br(285M) br(60M) [4] Tesla K80 | 42'C, 0 % | 11439 / 11439 MB | br(10854M) br(210M) br(285M) br(60M) [5] Tesla K80 | 49'C, 0 % | 11439 / 11439 MB | br(10854M) br(210M) br(285M) br(60M) [6] Tesla K80 | 37'C, 0 % | 11439 / 11439 MB | br(10854M) br(210M) br(285M) br(60M) [7] Tesla K80 | 45'C, 0 % | 11439 / 11439 MB | br(10852M) br(212M) br(285M) br(60M)