Closed sq6ra closed 4 years ago
@sq6ra No it's not bug. If we save the model which the weight tensor type is torch.CudaTensor
then, this model can't be loaded on CPU directly. So it temporary change the weight to normal tensor and back to original device (if it's gpu, then go to gpu device)
@codertimo when running the code on multi-GPUs environment, I guess to(self.device)
only put it back to CUDA:0, which might fail to restore the tensor to all devices.
Yes the default model is on CUDA:0 device. And the paralleled model is keep in the other gpus, even we just get the model from CUDA:0 device. And get back to the CUDA:0 device right after the save is finished. So I don't think this is an bug.
In pretrain.py, save() methods. I guess
self.bert.to(self.device)
should be removed..right?