NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.39k stars 1.4k forks source link

Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model? #750

Open Damiox opened 4 years ago

Damiox commented 4 years ago

Can I use apex at inference time on a pure FP32 model that was not trained with apex? Does apex require for inferences explicitly and emphatically a model that was initially trained with apex being enabled? It's not clear for me yet.

Could I get some explanation about that? I can't find the answer in the docs.

Lornatang commented 4 years ago

@Damiox First, you must ensure that the 'optimizer' parameter exists in your model. If you run the following code without any problems, you are successful.

# Initialization
opt_level = 'O2'  # for only use FP32
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)

# Restore
model = ...
optimizer = ...
checkpoint = torch.load('checkpoint.pth')

model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
amp.load_state_dict(checkpoint['amp'])
Damiox commented 4 years ago

hey @Lornatang . This is my current code:

                try:
                    from apex import amp
                    from transformers.optimization import AdamW
                    optimizer = AdamW(self.model.parameters())
                    self.model, optimizer = amp.initialize(self.model, optimizer, opt_level='O1')
                except ImportError:
                    print("NVIDIA's apex library is not installed. Automatic Mixed Precision cannot be enabled.")

The optimizer I'm using is this https://huggingface.co/transformers/main_classes/optimizer_schedules.html

My questions below:

Lornatang commented 4 years ago

@Damiox

Damiox commented 4 years ago

@Lornatang actually my problem is that I'm using a model that was not trained with mixed precision. It's fp32. I'm running inferences faster by using apex with O1 level at inference time for this model. I don't see much discrepancies, but I'm not sure whether what I do is right or not. I can't find in the documentation whether that's ok. Do you know where I can confirm that from? Based on what you say then, training with mixed precision is not a requirement for using apex later for inferences? I can grab any fp32 model and run inferences with apex then, right? Thanks

Lornatang commented 4 years ago

@Damiox Yes, you can

Damiox commented 4 years ago

@Lornatang thank for helping me out with this. Could you please elaborate more in the reason why this should work? What's the theory behind? Thanks

Lornatang commented 4 years ago

@Damiox Apex tool characteristics: Hybrid precision training + dynamic loss amplification.

1.The essence of mixed precision training lies in "using fp16 as storage and multiplication in memory to speed up calculation, using fp32 as accumulation to avoid rounding error". The strategy of hybrid precision training effectively alleviates the problem of rounding error.

2.Loss scaling uses mixed precision training, or it can't converge, because the value of activation gradient is too small, resulting in underflow. The idea of loss amplification is as follows:

Damiox commented 4 years ago

Just to be 100% on the same page here. I am using an existing model for inferences that was initially trained with FP32 and without apex. I am using that model for inferences, not re-training it. I am using apex at inference time only to speed things up. I am not interested in anything about apex+training because I cannot re-train this model. Thanks

Damiox commented 4 years ago

@Lornatang I'm sorry to ping you again, but just wanted to make sure you got my point clearly. Is it wrong to initialize apex for inferences on an existing FP32 model that I haven't re-trained with apex? Everywhere in the documentation it looks like it's assumed that the model is re-trained with apex, but I'm not re-training my model with apex, I'm just using apex when running predictions for my model. Just wanted to clarify it and get some feedback from you. Thanks!

Lornatang commented 4 years ago

@Damiox Sorry. Apex reasoning can be done on any FP32 precision model. You can try to load pytorch's vgg19 pre training model, which is trained with FP32. In the same way, he can initialize the model and make reasoning through the code I gave earlier.

kwanUm commented 4 years ago

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

Damiox commented 4 years ago

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

A considerable amount of time indeed. Take a look at your GPU. Not all GPUs take advantage of the FP16 operations. Tesla T4 indeed does take advantage of Apex. It's approximately 2x faster

BuaaAlban commented 4 years ago

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

A considerable amount of time indeed. Take a look at your GPU. Not all GPUs take advantage of the FP16 operations. Tesla T4 indeed does take advantage of Apex. It's approximately 2x faster

I have run inference for a Fp32 model on TESLA T4, but I haven't got any speed up, how can I confirm that tensor Core is used? or can you help me ? I have changed the model by

model = amp.initialize(model, opt_level='O3') and changed the input of the model by t_audio_signal_e=t_audio_signal_e.to(torch.half).cuda()

Thanks