Closed guilhermehge closed 1 year ago
u should not have both onnxruntime
and onnxruntime-gpu
, it always default to cpu
installing onnxruntime-gpu
alone should be enough, faster_whisper
uses it for silero VAD but always cpu https://github.com/guillaumekln/faster-whisper/blob/master/faster_whisper/vad.py#L260
the caveat with onnxruntime-gpu
is u must properly install cuda + cudnn at system level
But can silero vad run with onnxruntime-gpu? To do that I believe I might need to change the requirements of faster whisper so it does not install onnxruntime, right?
I'm running the application on docker with the following image: nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04, so cuda + cudnn are properly installed
it's possible to run silero vad with onnxruntime-gpu
, see my comment https://github.com/guillaumekln/faster-whisper/issues/364#issuecomment-1645272083
idk u using what version of onnxruntime
but for latest version better use cuda 11.8 https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
Thanks for that, phineas!
Let me ask you something else. Faster_whisper's transcribe is already taking up 99% of my GPU, if I run VAD on GPU as well, would it be a problem or would it take longer due to that? I read through transcribe.py and I see that SileroVAD is only used within the transcribe function and the segments are a generator, so it should not overload the GPU. Am I correct?
I implemented this code of yours from # 364 (comment)
and it actually increased the transcribe function time, going from 2 to 7 seconds for an audio that i'm testing. Do you know why that happened?
Analyzing it further I believe that happens because it creates a session everytime we call the transcribe function, so, since it is using GPU, it increases session creation time.
hmm seem like i misread your previous comment, silero vad should work with onnxruntime-gpu
, default to cpu, my code is just a tweak to make it work on gpu but not absolute necessity
it always create new onnx session no matter gpu or cpu, but take more time to load to gpu i guess (loading time > processing time), maybe need a longer audio to test for actual speed up
Yes, at first I did want to run it with the onnxruntime-gpu library but using the Silero VAD on CPU, but since you posted the code, I tried running it on the GPU, but session time increases the time too much for small audios, so it's not worth it in most cases, better to use CPU with more threads active.
I'm trying to run this code along with pyannote's 3.0 diarization pipeline, which requires onnxruntime-gpu, so faster_whisper's requirements were causing a conflict.
I'm using a docker container in a pod with GPU orchestrated by kubernetes, there I'm building an image based on nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04
. I created this issue because I was testing the onnxruntime-gpu in this environment inside a jupyter-notebook but the kernel kept dying when trying to run inference with whisper, and I couldn't figure out why, but then, later, I ran a .py complete code outside jupyter and it worked fine. I still don't know why the jupyter notebook kernel keeps dying with this library.
should had shared the config info since the beginning to avoid talking to nowhere 😅
so the actual problem is jupyter kernel crash, u have logs ?
I'll be running some tests and I'll comeback here with the results. For now, I don't have any logs, I killed the pod before accessing them.
Edit: I'll only be able to touch this issue again next week. When I get the results, I will post them here.
Hi, I am having the same issue: I need to run onnxruntime-gpu
but I can't easily uninstall the cpu version since I am using Cog and pushing it to Replicate. Meaning I can't change the code as per https://github.com/guillaumekln/faster-whisper/issues/364#issuecomment-1645272083 , or I don't know how at least. Any ideas how to force my build to use onnxruntime-gpu
and remove onnxruntime
?
I created a pull request that fixes this issue: https://github.com/guillaumekln/faster-whisper/pull/499. You can try it by importing git+https://github.com/thomasmol/faster-whisper.git@master
.
your PR is very likely to be rejected, it only works with nvidia gpu, meanwhile faster-whisper
is cross-platform, that's why my code snippet just stay as it is instead of send PR
Thanks for the heads up
I don't recommend running silero vad on GPU either, since it takes longer to instantiate a session than the CPU version. For shorter audios, it increases the overall time significantly. I've had had 2s on CPU versus 7s on GPU for certain audios.
Perhaps it's possible we add an option for the user to select GPU or CPU for silero vad, using the parameters class.
So, for this issue, @phineas-pta, I fixed it by installing only onnxruntime-gpu, the jupyter notebook is working properly and everything is running as it should be.
To do this, I cloned whisper repo, created a build with only the onnxruntime-gpu version and installed it, now everything is running normally. Thanks for the help.
@guilhermehge yes I did the same at works! Maybe we could create a fork faster-whisper-gpu
and have a gpu only version?
It seems that the current pyannote version (3.0.1) is not working with the current faster_whisper version. Any idea solution on that?
It is. I am using it atm. How are you implementing it? Docker? Colab? Locally w/o docker?
Locally. should be the problem I guess. Wanted to update whisperX on that matter
Did you create a virtual environment to do that?
Can you further explain your problem so we can debug it?
Its is a local env made with conda, with m2 silicon. Env had whisperX install previously, trying to build it with new piannote version send me an error :
ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu>=1.16.0 (from pyannote-audio) (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu>=1.16.0
@remic33 pyannote dont officially support mac, there's already many issues on pyannote repo about that
It worked previously, I know it because I was using it and I was part of those discussions. You just needed to add some packages. But maybe with onnx gpu it do not anymore. Thanks for your help !
I am trying to use faster_whisper with pyannote for speech overlap detection and speaker diarization, but the pyannote's new update 3.0.0, it will need onnxruntime-gpu to run the diarization pipeline with the new embedding model.
Installing both onnxruntime (from faster_whisper) and onnxruntime-gpu (from pyannote), causes a conflict and onnx redirects to CPU only.
I tried uninstalling onnxruntime and forcing the reinstall of onnxruntime-gpu and faster_whisper is no longer working.
Is it possible to use onnxruntime-gpu for faster_whisper?