codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation
Apache License 2.0
6.11k stars 1.29k forks source link

imbalance GPU memory usage #10

Closed WencongXiao closed 5 years ago

WencongXiao commented 5 years ago

Hi,

Nice try for BERT implementation.

I try to run your code in 4V100 and I find the memory usage is imbalance: the first GPU consume 2x memory than the others. Any idea about the reason?

Btw, I think the parameter order in train.py line 64 is incorrect.

codertimo commented 5 years ago

@WencongXiao Thank you for your interest to my project.

The reason why first GPU takes double memory is that first GPU contains both model parameter and batch input data. Others only contain the intput batch data for computation

first GPU = model_parameter + batch_data other GPUS = batch_data

if you have any significant result on 4 GPU training, please let me know the result and your dataset. It would be very helpful for me!! (This code is not verified yet cause lack of computation power)

thanx

thomwolf commented 5 years ago

Hi, Here is more information on this question of GPUs imbalance: https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255 Best, Thom

codertimo commented 5 years ago

@thomwolf oh my god. Thank you for let us know the great post. It was so helpful! I'll try applying the multi-gpu training on 0.0.1a4. Thank again 👍 by junseong