Use Nvidia optimum to speed up inference

Vaibhavs10 / insanely-fast-whisper

Apache License 2.0

7.5k stars 529 forks source link

Use Nvidia optimum to speed up inference #111

Closed SKocur closed 9 months ago

SKocur commented 10 months ago

Hello! For Nvidia based workstations, is it possible to use Nvidia Optimum pipelines instead of Hugging Face default ones to gain speed in Whisper token generation? I have not tested it though. Here is the referenced article mentioning gains in LLaMA based models: https://huggingface.co/blog/optimum-nvidia and https://github.com/huggingface/optimum-nvidia

Vaibhavs10 commented 9 months ago

Let's keep the conversation in the PR (when optimum-nvidia makes a release). I am closing this issue.