long form inference - Githubissues

NVIDIA-AI-IOT / whisper_trt

A project that optimizes Whisper for low latency inference using NVIDIA TensorRT

Other

63 stars 9 forks source link

Open eschmidbauer opened 1 month ago

eschmidbauer commented 1 month ago

Is long form inference possible with whisper_trt ? I tried inference on 4m16s audio clip and it appeared to only transcribe 30s, here is my script:

from whisper_trt import load_trt_model

model = load_trt_model("small.en")
result = model.transcribe("test.wav")

jaybdub commented 1 month ago

Hi @eschmidbauer ,

It should be possible, but seems like we'll need to make some modifications to the transcribe function:

Currently, it runs on a single 30s window.

John

eschmidbauer commented 1 month ago

It would be great to demonstrate long-form here perhaps by using sliding window