Unable to train this on multiple GPU

fliptrail commented 4 years ago

Hello, As the title suggests, I am unable to train this model on multiple gpu configuration. I am trying to train it on 4 RTX 2080 Ti. It loads up the model only on the 1st GPU utilizing a memory of around 10.5 GB/11 GB For the remaining GPU's, it is only utilizing a memory of 155 MB/11 GB. Also, the training speed is independent of the number of GPU's selected by me using CUDA_VISIBLE_DEVICES. So, apparently it is only using the 1st GPU. I tried diving in the code to find out the exact function multi_gpu_model, but everything seemed fine to me. So, can you confirm or tell me how to train this implementation over multiple GPU's?

fliptrail commented 4 years ago

I am encountering this exact issue on Tensorflow=2.0.0 https://github.com/tensorflow/tensorflow/issues/30321 Possible solution is given here.

ParikhKadam commented 4 years ago

Yes.. the possible solution is in above mentioned link. Read more about "model parallelism vs data parallelism".

ParikhKadam / bidaf-keras

Unable to train this on multiple GPU #24