collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
2.09k stars 283 forks source link

Tensorrt backend with base.en not working #207

Closed kalradivyanshu closed 6 months ago

kalradivyanshu commented 6 months ago

I am trying to use whisper base.en with tensorrt. I followed the steps and built the weights in the docker, it all worked. But when I try to connect using;

from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
  "IP",
  9091,
  lang="en",
  translate=False,
  model="base.en",
  use_vad=False,
)
client()

It just fails with:

root@57510116e650:/home/WhisperLive# python3 run_server.py --port 9091                       --backend tensorrt
              --trt_model_path ./whisper_base_en/
/home/WhisperLive/whisper_live/vad.py:141: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.)
  speech_prob = self.model(torch.from_numpy(audio_frame), self.frame_rate).item()
[05/05/2024-23:58:41] Unexpected error: Input audio chunk is too short

I am using the latest whisper_live, Here is the GPU info: image

Its nvidia L4.

I think its an issue with the VAD, which is weird, because in client I disabled the VAD, but any help would be very appreciated! Thanks!

kalradivyanshu commented 6 months ago

Just to add running the backend with faster_whisper backend from inside the docker, with the same client works as expected.

makaveli10 commented 6 months ago

With TensorRT-backend we take a longer audio chunk to start with because we havent added timestamp support there.

kalradivyanshu commented 6 months ago

@makaveli10 so the microphone client doesn't work with tensorrt? Why is it crashing?