Run with onnxruntime-gpu not working for faster_whisper

guilhermehge commented 1 year ago

I am trying to use faster_whisper with pyannote for speech overlap detection and speaker diarization, but the pyannote's new update 3.0.0, it will need onnxruntime-gpu to run the diarization pipeline with the new embedding model.

Installing both onnxruntime (from faster_whisper) and onnxruntime-gpu (from pyannote), causes a conflict and onnx redirects to CPU only.

I tried uninstalling onnxruntime and forcing the reinstall of onnxruntime-gpu and faster_whisper is no longer working.

Is it possible to use onnxruntime-gpu for faster_whisper?

phineas-pta commented 1 year ago

u should not have both onnxruntime and onnxruntime-gpu, it always default to cpu

installing onnxruntime-gpu alone should be enough, faster_whisper uses it for silero VAD but always cpu https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py#L260

the caveat with onnxruntime-gpu is u must properly install cuda + cudnn at system level

guilhermehge commented 1 year ago

But can silero vad run with onnxruntime-gpu? To do that I believe I might need to change the requirements of faster whisper so it does not install onnxruntime, right?

I'm running the application on docker with the following image: nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04, so cuda + cudnn are properly installed

phineas-pta commented 1 year ago

it's possible to run silero vad with onnxruntime-gpu, see my comment https://github.com/guillaumekln/faster-whisper/issues/364#issuecomment-1645272083

idk u using what version of onnxruntime but for latest version better use cuda 11.8 https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

guilhermehge commented 1 year ago

Thanks for that, phineas!

Let me ask you something else. Faster_whisper's transcribe is already taking up 99% of my GPU, if I run VAD on GPU as well, would it be a problem or would it take longer due to that? I read through transcribe.py and I see that SileroVAD is only used within the transcribe function and the segments are a generator, so it should not overload the GPU. Am I correct?

guilhermehge commented 1 year ago

I implemented this code of yours from # 364 (comment)

and it actually increased the transcribe function time, going from 2 to 7 seconds for an audio that i'm testing. Do you know why that happened?

Analyzing it further I believe that happens because it creates a session everytime we call the transcribe function, so, since it is using GPU, it increases session creation time.

phineas-pta commented 1 year ago

hmm seem like i misread your previous comment, silero vad should work with onnxruntime-gpu, default to cpu, my code is just a tweak to make it work on gpu but not absolute necessity

it always create new onnx session no matter gpu or cpu, but take more time to load to gpu i guess (loading time > processing time), maybe need a longer audio to test for actual speed up

guilhermehge commented 1 year ago

Yes, at first I did want to run it with the onnxruntime-gpu library but using the Silero VAD on CPU, but since you posted the code, I tried running it on the GPU, but session time increases the time too much for small audios, so it's not worth it in most cases, better to use CPU with more threads active.

I'm trying to run this code along with pyannote's 3.0 diarization pipeline, which requires onnxruntime-gpu, so faster_whisper's requirements were causing a conflict.

I'm using a docker container in a pod with GPU orchestrated by kubernetes, there I'm building an image based on nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04. I created this issue because I was testing the onnxruntime-gpu in this environment inside a jupyter-notebook but the kernel kept dying when trying to run inference with whisper, and I couldn't figure out why, but then, later, I ran a .py complete code outside jupyter and it worked fine. I still don't know why the jupyter notebook kernel keeps dying with this library.

phineas-pta commented 1 year ago

should had shared the config info since the beginning to avoid talking to nowhere 😅

so the actual problem is jupyter kernel crash, u have logs ?

guilhermehge commented 1 year ago

I'll be running some tests and I'll comeback here with the results. For now, I don't have any logs, I killed the pod before accessing them.

Edit: I'll only be able to touch this issue again next week. When I get the results, I will post them here.

thomasmol commented 1 year ago

Hi, I am having the same issue: I need to run onnxruntime-gpu but I can't easily uninstall the cpu version since I am using Cog and pushing it to Replicate. Meaning I can't change the code as per https://github.com/guillaumekln/faster-whisper/issues/364#issuecomment-1645272083 , or I don't know how at least. Any ideas how to force my build to use onnxruntime-gpu and remove onnxruntime?

thomasmol commented 1 year ago

I created a pull request that fixes this issue: https://github.com/guillaumekln/faster-whisper/pull/499. You can try it by importing git+https://github.com/thomasmol/faster-whisper.git@master.

phineas-pta commented 1 year ago

your PR is very likely to be rejected, it only works with nvidia gpu, meanwhile faster-whisper is cross-platform, that's why my code snippet just stay as it is instead of send PR

thomasmol commented 1 year ago

Thanks for the heads up

guilhermehge commented 1 year ago

I don't recommend running silero vad on GPU either, since it takes longer to instantiate a session than the CPU version. For shorter audios, it increases the overall time significantly. I've had had 2s on CPU versus 7s on GPU for certain audios.

Perhaps it's possible we add an option for the user to select GPU or CPU for silero vad, using the parameters class.

guilhermehge commented 1 year ago

So, for this issue, @phineas-pta, I fixed it by installing only onnxruntime-gpu, the jupyter notebook is working properly and everything is running as it should be.

To do this, I cloned whisper repo, created a build with only the onnxruntime-gpu version and installed it, now everything is running normally. Thanks for the help.

thomasmol commented 1 year ago

@guilhermehge yes I did the same at works! Maybe we could create a fork faster-whisper-gpu and have a gpu only version?

remic33 commented 1 year ago

It seems that the current pyannote version (3.0.1) is not working with the current faster_whisper version. Any idea solution on that?

guilhermehge commented 1 year ago

It is. I am using it atm. How are you implementing it? Docker? Colab? Locally w/o docker?

remic33 commented 1 year ago

Locally. should be the problem I guess. Wanted to update whisperX on that matter

guilhermehge commented 1 year ago

Did you create a virtual environment to do that?

Can you further explain your problem so we can debug it?

remic33 commented 1 year ago

Its is a local env made with conda, with m2 silicon. Env had whisperX install previously, trying to build it with new piannote version send me an error :

ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu>=1.16.0 (from pyannote-audio) (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu>=1.16.0

phineas-pta commented 1 year ago

@remic33 pyannote dont officially support mac, there's already many issues on pyannote repo about that

remic33 commented 1 year ago

It worked previously, I know it because I was using it and I was part of those discussions. You just needed to add some packages. But maybe with onnx gpu it do not anymore. Thanks for your help !

SYSTRAN / faster-whisper

Run with onnxruntime-gpu not working for faster_whisper #493