codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation
Apache License 2.0
6.11k stars 1.29k forks source link

Maybe Bugs? #45

Closed sq6ra closed 4 years ago

sq6ra commented 5 years ago

In pretrain.py, save() methods. I guess self.bert.to(self.device) should be removed..right?

codertimo commented 5 years ago

@sq6ra No it's not bug. If we save the model which the weight tensor type is torch.CudaTensor then, this model can't be loaded on CPU directly. So it temporary change the weight to normal tensor and back to original device (if it's gpu, then go to gpu device)

https://github.com/codertimo/BERT-pytorch/blob/d10dc4f9d5a6f2ca74380f62039526eb7277c671/bert_pytorch/trainer/pretrain.py#L148-L149

sq6ra commented 5 years ago

@codertimo when running the code on multi-GPUs environment, I guess to(self.device) only put it back to CUDA:0, which might fail to restore the tensor to all devices.

codertimo commented 5 years ago

Yes the default model is on CUDA:0 device. And the paralleled model is keep in the other gpus, even we just get the model from CUDA:0 device. And get back to the CUDA:0 device right after the save is finished. So I don't think this is an bug.