Open Mijawel opened 1 year ago
Are you getting this using the code from this repository or are you running the dynamic quantisation in your own script?
Getting it from the repo (although I have moved over to the faster_whisper repo based on CTranslate2 since then, which seems to have working quantization)
That repo looks more developed than this one so that is probably a good idea, especially if they have verified their method has the same performance after optimisation. Good to see others also using quantisation. As for the warnings I’ve found similar issues when using other Nvidia libraries like TensorRT so will still need to look into this one.
I'm getting the following warning which prints probably 1000 times during execution: [W qlinear_dynamic.cpp:239] Warning: Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release. (function apply_dynamic_impl)
And I'm getting results maybe 30% slower than when running the non quantized model and I think it might be the cause.
Any idea how to fix?
(And just using filter warnings in python is not supressing them for some reason)
Hello, How do you fix the problem? I am using torch.quantization and also want to disable the printing.
I'm getting the following warning which prints probably 1000 times during execution: [W qlinear_dynamic.cpp:239] Warning: Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release. (function apply_dynamic_impl)
And I'm getting results maybe 30% slower than when running the non quantized model and I think it might be the cause.
Any idea how to fix?
(And just using filter warnings in python is not supressing them for some reason)