Open Appfinity-development opened 6 days ago
Can you limit the number of threads here and try again? https://github.com/SYSTRAN/faster-whisper/blob/97a4785fa13d067c300f8b6e40c4381ad0381c02/faster_whisper/vad.py#L263:L264
Which API is available to set SileroVADModel
SessionOptions
parameters?
Which API is available to set
SileroVADModel
SessionOptions
parameters?
Just change it in vad.py to:
opts.inter_op_num_threads = 1
opts.intra_op_num_threads = 1
Im running the code on a docker environment which just pulls in faster_whisper
package from PyPi. So local changes I make in Pycharm to package won't propagate to the Replicate server. Only 2 options I see is monkey patching or forking the whole lib. Both which I'm not really keen on doing..
Or am I missing a third option?
No third option currently, I just want you to test the fix first before we actually take any steps to fix
Tried monkey patching, this does remove the onnxruntime error but the OOM error still persisted. It turned out to be ctranslate2
version 4.5.0 was incompatible with the cog docker env of replicate. After downgrading to 4.4.0 it worked again. I did however keep the monkey patch since the logs won't be polluted then and the error seems something that should be addressed in 1.1.1.
Im now using large-v2
with the BatchedInferencePipeline
which speeds up the processing time around 2x. Very nice for the same model.
This is my current packages in case someone else runs into the issue:
- "torch==2.3.0"
- "torchaudio==2.3.0"
- "faster-whisper==1.1.0"
- "pyannote-audio==3.3.2"
- "ctranslate2==4.4.0"
monkey patch:
import faster_whisper.vad
from faster_whisper.vad import SileroVADModel
# to prevent "Invalid argument. Specify the number of threads explicitly so the affinity is not set" onnxruntime error
class PatchedSileroVADModel(SileroVADModel):
def __init__(self, encoder_path, decoder_path):
try:
import onnxruntime
except ImportError as e:
raise RuntimeError(
"Applying the VAD filter requires the onnxruntime package"
) from e
# Custom modification for SessionOptions
opts = onnxruntime.SessionOptions()
opts.inter_op_num_threads = 4
opts.intra_op_num_threads = 4
opts.log_severity_level = 3
# Initialize sessions with modified options
self.encoder_session = onnxruntime.InferenceSession(
encoder_path,
providers=["CPUExecutionProvider"],
sess_options=opts,
)
self.decoder_session = onnxruntime.InferenceSession(
decoder_path,
providers=["CPUExecutionProvider"],
sess_options=opts,
)
faster_whisper.vad.SileroVADModel = PatchedSileroVADModel
I think it should be
opts.inter_op_num_threads = 1
opts.intra_op_num_threads = 1
I think it should be
opts.inter_op_num_threads = 1 opts.intra_op_num_threads = 1
the error he's mentioning is only caused when the value is 0 since that means onnx must infer the actual number and it fails to do so, any fixed number should fix the error, setting it to 1 should be the safest but not the fastest
Also VAD encoder now benefits from GPU acceleration if anyone needs it
Updated from 1.0.3 to 1.1.0. Now an onnxruntime thread affinity crash occurs each time. Both versions run on a Nvidia A40 with 4 CPU cores, 48GB VRAM and 16GB RAM (on a private Replicate server). Shouldn't be a hardware issue. Our model config:
Also tried this:
But to no avail. Any suggestions? Below the crash log.
The cog.yaml with dependencies looks like this:
Also tried removing the onnxruntime dependency or setting it to a specific gpu version. But nothing fixes the issue. Anyone with ideas (@MahmoudAshraf97) ?
If the
cpu
is used asdevice
onWhisperModel
the onnxruntime error still shows in the logs but there is no crash and transcribing finishes successfully.