NVIDIA-AI-IOT / whisper_trt

A project that optimizes Whisper for low latency inference using NVIDIA TensorRT
Other
63 stars 9 forks source link

long form inference #11

Open eschmidbauer opened 1 month ago

eschmidbauer commented 1 month ago

Is long form inference possible with whisper_trt ? I tried inference on 4m16s audio clip and it appeared to only transcribe 30s, here is my script:

from whisper_trt import load_trt_model

model = load_trt_model("small.en")
result = model.transcribe("test.wav")
jaybdub commented 1 month ago

Hi @eschmidbauer ,

It should be possible, but seems like we'll need to make some modifications to the transcribe function:

https://github.com/NVIDIA-AI-IOT/whisper_trt/blob/268eff10a1e38118a2734745b9db14f7419a08a5/whisper_trt/model.py#L162

Currently, it runs on a single 30s window.

John

eschmidbauer commented 1 month ago

It would be great to demonstrate long-form here perhaps by using sliding window