ParikhKadam / bidaf-keras

Bidirectional Attention Flow for Machine Comprehension implemented in Keras 2
GNU General Public License v3.0
64 stars 21 forks source link

Unable to train this on multiple GPU #24

Open fliptrail opened 4 years ago

fliptrail commented 4 years ago

Hello, As the title suggests, I am unable to train this model on multiple gpu configuration. I am trying to train it on 4 RTX 2080 Ti. It loads up the model only on the 1st GPU utilizing a memory of around 10.5 GB/11 GB For the remaining GPU's, it is only utilizing a memory of 155 MB/11 GB. Also, the training speed is independent of the number of GPU's selected by me using CUDA_VISIBLE_DEVICES. So, apparently it is only using the 1st GPU. I tried diving in the code to find out the exact function multi_gpu_model, but everything seemed fine to me. So, can you confirm or tell me how to train this implementation over multiple GPU's?

fliptrail commented 4 years ago

I am encountering this exact issue on Tensorflow=2.0.0 https://github.com/tensorflow/tensorflow/issues/30321 Possible solution is given here.

ParikhKadam commented 4 years ago

Yes.. the possible solution is in above mentioned link. Read more about "model parallelism vs data parallelism".