Closed albertofernandezvillan closed 2 years ago
That’s correct. The Linear() layer is a custom layer which is provided by the OpenAI Whisper authors to deal with different model data types (fp32, fp16) as the model can be run at different precisions. However, the Dynamic Quantization code traverses the different layers and quantises them based on a list of layers it is told to quantize. This list must be based on builtin torch.nn layers. So when it sees Linear() it does not recognise it. So when this is changed to torch.nn.linear(), the model is then in a format which the quantization module recognises and no further modifications are necessary.
So If I have understood correctly, the only necessary change is to change
Linear()
layer in the OpenAI Whisper model tonn.Linear()
inwhisper/whisper/model.py
and then, perform the dynamic quantization:Is it correct, or are there any additional changes that should be performed for improving transcription performance of OpenAI Whisper for CPU based deployment?
Thanks