Slower results from quantized model potentially due to warning prints

MiscellaneousStuff / openai-whisper-cpu

Improving transcription performance of OpenAI Whisper for CPU based deployment

MIT License

234 stars 19 forks source link

Slower results from quantized model potentially due to warning prints #9

Open Mijawel opened 1 year ago

Mijawel commented 1 year ago

I'm getting the following warning which prints probably 1000 times during execution: [W qlinear_dynamic.cpp:239] Warning: Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release. (function apply_dynamic_impl)

And I'm getting results maybe 30% slower than when running the non quantized model and I think it might be the cause.

Any idea how to fix?

(And just using filter warnings in python is not supressing them for some reason)

MiscellaneousStuff commented 1 year ago

Are you getting this using the code from this repository or are you running the dynamic quantisation in your own script?

Mijawel commented 1 year ago

Getting it from the repo (although I have moved over to the faster_whisper repo based on CTranslate2 since then, which seems to have working quantization)

MiscellaneousStuff commented 1 year ago

That repo looks more developed than this one so that is probably a good idea, especially if they have verified their method has the same performance after optimisation. Good to see others also using quantisation. As for the warnings I’ve found similar issues when using other Nvidia libraries like TensorRT so will still need to look into this one.

cehongwang commented 10 months ago

I'm getting the following warning which prints probably 1000 times during execution: [W qlinear_dynamic.cpp:239] Warning: Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release. (function apply_dynamic_impl)

And I'm getting results maybe 30% slower than when running the non quantized model and I think it might be the cause.

Any idea how to fix?

(And just using filter warnings in python is not supressing them for some reason)

Hello, How do you fix the problem? I am using torch.quantization and also want to disable the printing.