Blair-Johnson / batch-whisper

Batch Support for OpenAI Whisper
MIT License
82 stars 21 forks source link

Batch processing #1

Closed Blair-Johnson closed 1 year ago

Blair-Johnson commented 1 year ago

Opening PR for merging and conflict resolution. The batch-processing branch introduces a single major modification to the behavior of the model.transcribe() method, which can now accept a list of audio file paths rather than a single audio file path. These files are packed into the batch dimension of the model for transcription, allowing users to achieve better GPU utilization. Audio clips can be different lengths and the internal batch size will be reduced as the transcription of sorter files is completed.

Remaining issues to address:

Blair-Johnson commented 1 year ago

Initial benchmarking indicates that batching enables significantly sub-linear scaling at least until batch_size=16 on a NVIDIA A100 80GB. Batching remains "sub-linear" with respect to batch_size=1 afterward, but the time required for a set of batched audio clips begins scaling linearly with additional increases in batch size as the GPU becomes saturated. In this figure, a 214min podcast was batched together with itself for different batch sizes in {1,2,4,8,16} and transcribed in-parallel. The linear reference assumes linear scaling with respect to the batch_size=1 case and is analogous to running consecutive clips serially.

image