Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model?

Damiox commented 4 years ago

Can I use apex at inference time on a pure FP32 model that was not trained with apex? Does apex require for inferences explicitly and emphatically a model that was initially trained with apex being enabled? It's not clear for me yet.

Could I get some explanation about that? I can't find the answer in the docs.

Lornatang commented 4 years ago

@Damiox First, you must ensure that the 'optimizer' parameter exists in your model. If you run the following code without any problems, you are successful.

# Initialization
opt_level = 'O2'  # for only use FP32
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)

# Restore
model = ...
optimizer = ...
checkpoint = torch.load('checkpoint.pth')

model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
amp.load_state_dict(checkpoint['amp'])

Damiox commented 4 years ago

hey @Lornatang . This is my current code:

                try:
                    from apex import amp
                    from transformers.optimization import AdamW
                    optimizer = AdamW(self.model.parameters())
                    self.model, optimizer = amp.initialize(self.model, optimizer, opt_level='O1')
                except ImportError:
                    print("NVIDIA's apex library is not installed. Automatic Mixed Precision cannot be enabled.")

The optimizer I'm using is this https://huggingface.co/transformers/main_classes/optimizer_schedules.html

My questions below:

Why should I need to use O2 instead of O1 for my use case?
Why should I restore the checkpoint?
Isn't my code good enough for my use case? Please remember that my model has not been trained with mixed precision. It's fp32, but I'm just trying to use apex only at inference time. I cannot find anything in the documentation about exploring that option. Is it not expected?

Lornatang commented 4 years ago

@Damiox

I think your problem is that you need to train on double precision instead of dynamic precision. If you set O1, it will work better.
If you call the pre training model directly, you do not need to, otherwise you must specify the location of the model weight in your directory.
Your sample code is correct and can be used.

Damiox commented 4 years ago

@Lornatang actually my problem is that I'm using a model that was not trained with mixed precision. It's fp32. I'm running inferences faster by using apex with O1 level at inference time for this model. I don't see much discrepancies, but I'm not sure whether what I do is right or not. I can't find in the documentation whether that's ok. Do you know where I can confirm that from? Based on what you say then, training with mixed precision is not a requirement for using apex later for inferences? I can grab any fp32 model and run inferences with apex then, right? Thanks

Lornatang commented 4 years ago

@Damiox Yes, you can

Damiox commented 4 years ago

@Lornatang thank for helping me out with this. Could you please elaborate more in the reason why this should work? What's the theory behind? Thanks

Lornatang commented 4 years ago

@Damiox Apex tool characteristics: Hybrid precision training + dynamic loss amplification.

1.The essence of mixed precision training lies in "using fp16 as storage and multiplication in memory to speed up calculation, using fp32 as accumulation to avoid rounding error". The strategy of hybrid precision training effectively alleviates the problem of rounding error.

2.Loss scaling uses mixed precision training, or it can't converge, because the value of activation gradient is too small, resulting in underflow. The idea of loss amplification is as follows:

Before back propagation, the loss change (dloss) is increased by 2 ^ k times manually, so the intermediate variable (activation function gradient) obtained during back propagation will not overflow;
After back propagation, the weight gradient will be reduced by times and return to normal value.

Damiox commented 4 years ago

Just to be 100% on the same page here. I am using an existing model for inferences that was initially trained with FP32 and without apex. I am using that model for inferences, not re-training it. I am using apex at inference time only to speed things up. I am not interested in anything about apex+training because I cannot re-train this model. Thanks

Damiox commented 4 years ago

@Lornatang I'm sorry to ping you again, but just wanted to make sure you got my point clearly. Is it wrong to initialize apex for inferences on an existing FP32 model that I haven't re-trained with apex? Everywhere in the documentation it looks like it's assumed that the model is re-trained with apex, but I'm not re-training my model with apex, I'm just using apex when running predictions for my model. Just wanted to clarify it and get some feedback from you. Thanks!

Lornatang commented 4 years ago

@Damiox Sorry. Apex reasoning can be done on any FP32 precision model. You can try to load pytorch's vgg19 pre training model, which is trained with FP32. In the same way, he can initialize the model and make reasoning through the code I gave earlier.

kwanUm commented 4 years ago

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

Damiox commented 4 years ago

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

A considerable amount of time indeed. Take a look at your GPU. Not all GPUs take advantage of the FP16 operations. Tesla T4 indeed does take advantage of Apex. It's approximately 2x faster

BuaaAlban commented 4 years ago

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

A considerable amount of time indeed. Take a look at your GPU. Not all GPUs take advantage of the FP16 operations. Tesla T4 indeed does take advantage of Apex. It's approximately 2x faster

I have run inference for a Fp32 model on TESLA T4， but I haven't got any speed up, how can I confirm that tensor Core is used? or can you help me ? I have changed the model by

model = amp.initialize(model, opt_level='O3') and changed the input of the model by t_audio_signal_e=t_audio_signal_e.to(torch.half).cuda()

Thanks

NVIDIA / apex

Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model? #750