Closed Philipp-Sc closed 1 year ago
Hello Philipp
This is my fault. I only implied this in the test results, but the Linear()
layer in the OpenAI Whisper model needs to be changed to nn.Linear()
. If you look at the README now, there are updated instructions to get this to work with minimal effort. This should most likely get the model to shrink and improve the run time performance with similar improvements as mentioned in the results table. Good luck!
Thanks a lot! It is working now.
Audio: audio/Byron_Katie_Podcast/Byron_Katie_KICK_OFF_FINAL_MIX.mp3
Language Tag --language
Language: English
Model Tag: --model
Model: large
1.12.1+cu102
Size (MB): 6173.662482
Size (MB): 1770.022958
Hi, this is Byron Katie and welcome to the At Home Podcast. It is my privilege and passion to bring inquiry, self-inquiry, to those of you suffering from relationship with yourself, with others, with the world. At first, what you hear may seem radical and I invite you to stay with it. Don't give up. Just hang in with us. So sit back, relax, and enjoy the wisdom of this week's guests,
Evaluate total time (seconds): 49.7
Hi, this is Byron Katie and welcome to the At Home podcast. It is my privilege and passion to bring inquiry, self-inquiry, to those of you suffering from relationship with yourself, with others, with the world. At first what you hear may seem radical and I invite you to stay with it. Don't give up. Just hang in with us. So sit back, relax and enjoy the
Evaluate total time (seconds): 15.2
The speed is much improved ~x3,25. The accuracy seems to be somewhat affected, but the results are still very good.
In the results table you did not include the large model. Is there a reason for that, or do you recommend the usage of a smaller .en model (for English transcription) instead in combination with the quantized weights?
Edit: After a few more tests, it turns out for the smaller models the quantization actually improves the accuracy, only for the large model this does not seem to be the case. So I am leaning towards the medium.en
model for English transcription.
Hello again Philipp,
The only reason I didn't include the test results for the larger models is because the run-time execution gets a lot longer the larger the model, so running and documenting the results was starting to take a long time. I suspect that you would still get large performance increases for the larger models. By the looks of it, the performance increase continues to scale with model size from your results.
It's very interesting that you found improved performance with the smaller quantized models.
There is a PyTorch tutorial which applies dynamic quantization to a BERT model which is roughly the same size (model size) as the Base
OpenAI Whisper model, and they also found very slightly increased performance after quantization so it's interesting you have found that on another Transformer-based model.
It's very interesting that you found improved performance with the smaller quantized models. There is a PyTorch tutorial which applies dynamic quantization to a BERT model which is roughly the same size (model size) as the
Base
OpenAI Whisper model, and they also found very slightly increased performance after quantization so it's interesting you have found that on another Transformer-based model.
My guess is that for the large model the quantization reduces the size to much, such that important weight information gets lost. That would explain why the performance is slightly worse than with the original model. (Maybe a different quantization would also increase the performance for the large model.) While for the smaller models the quantization improves the weights by removing overhead.
It would be an interesting thing to study in a more quantitative analysis.
Hi,
I tried to reproduce your results on a CPU only system, after
quantize_dynamic
the model size is still the same. Am I missing something?Edit: I guess my CPU does not support
qint8