Automatic Inference Batching Support

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

MIT License

11.99k stars 1k forks source link

Automatic Inference Batching Support #295

Open SinanAkkoyun opened 1 year ago

SinanAkkoyun commented 1 year ago

Hey! the HuggingFace text-generation-inference is an inference server solution that can, if you do concurrent HTTP requests, automatically batch compute generations. (Let's say there is a request in progress and another one on the way, it can automatically adapt the batching in-progress)

I want to build an inference solution based on faster-whisper. Is manual batching supported? I am not sufficient enough to safely implement it on my own, but I would like to build up on top of that, if possible.

arnavmehta7 commented 1 year ago

Faster-whisper performs batching internally for various vad_segments. I would be curious to know if we can get more speedups if we batch multiple audios.

Afaik, huggingface one, has a DELTA during which, if multiple requests are sent then they will be batched and run together, otherwise not.

guillaumekln commented 1 year ago

Faster-whisper performs batching internally for various vad_segments.

No, there is no batching at this time. See #59 which is the main issue for batching support.

arnavmehta7 commented 1 year ago

@guillaumekln oh very very sorry for the oversight. I feel this could be easily done by taking this as reference https://github.com/m-bain/whisperX/blob/main/whisperx/asr.py#L210-L230