Open paulxin001 opened 10 months ago
It's getting worked on.
It's getting worked on.
Yeah, you could try the int8 weight only quantization branch, which greatly reduces the memory usage. Also, memory usage should not be a big issue, as the GPU utilization is already high, and freeing up memory would not be used for other tasks. @paulxin001
@paulxin001 Would you mind trying to removal layernorm plugin and try again? Thank you.
Why does the whisper model need 17GB of video memory? fast-whipser only needs 4G video memory? And I haven't found a way for whisper to quantize Int. Is it not supported now? This video memory occupies too much, is there any way to optimize it?