SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.59k stars 1.05k forks source link

Batch size impact on speed and VRAM consumption #1143

Closed vidalfer closed 2 days ago

vidalfer commented 1 week ago

I ran some tests with different batch sizes and noticed that neither VRAM usage nor transcription speed changed significantly. I tested with batch sizes of 64, 128, and 512 on an A100 GPU. Is this normal? Theoretically, VRAM usage should increase along with speed as the batch size grows. I'm using the batched pipeline in my tests.

MahmoudAshraf97 commented 1 week ago

transcription speed will plateau eventually once you are not bottlenecked by memory bandwidth, but the VRAM should be increasing nonetheless, another bottleneck will be the inefficiency introduced by regular batching that appears when some segments finish decoding before the others, to solve this continuous batching must be implemented for the decoder.

Start by batch_size=1 and increase gradually, you'll mostly reach the maximum speed at batch_size=32

pablopla commented 1 week ago

How does batching work on CPU? Without batching I don't see improvement in transcription speed above 4 threads. Does batching allows me to use more threads? Is batching supposed to also improve transcription speed on 4 threads compare to no batching?

MahmoudAshraf97 commented 6 days ago

I don't use it on CPUs unfortunately, feel free to test it and report the results

vidalfer commented 4 days ago

transcription speed will plateau eventually once you are not bottlenecked by memory bandwidth, but the VRAM should be increasing nonetheless, another bottleneck will be the inefficiency introduced by regular batching that appears when some segments finish decoding before the others, to solve this continuous batching must be implemented for the decoder.

Start by batch_size=1 and increase gradually, you'll mostly reach the maximum speed at batch_size=32

Thank you for the explanation!