Closed vidalfer closed 2 days ago
transcription speed will plateau eventually once you are not bottlenecked by memory bandwidth, but the VRAM should be increasing nonetheless, another bottleneck will be the inefficiency introduced by regular batching that appears when some segments finish decoding before the others, to solve this continuous batching must be implemented for the decoder.
Start by batch_size=1
and increase gradually, you'll mostly reach the maximum speed at batch_size=32
How does batching work on CPU? Without batching I don't see improvement in transcription speed above 4 threads. Does batching allows me to use more threads? Is batching supposed to also improve transcription speed on 4 threads compare to no batching?
I don't use it on CPUs unfortunately, feel free to test it and report the results
transcription speed will plateau eventually once you are not bottlenecked by memory bandwidth, but the VRAM should be increasing nonetheless, another bottleneck will be the inefficiency introduced by regular batching that appears when some segments finish decoding before the others, to solve this continuous batching must be implemented for the decoder.
Start by
batch_size=1
and increase gradually, you'll mostly reach the maximum speed atbatch_size=32
Thank you for the explanation!
I ran some tests with different batch sizes and noticed that neither VRAM usage nor transcription speed changed significantly. I tested with batch sizes of 64, 128, and 512 on an A100 GPU. Is this normal? Theoretically, VRAM usage should increase along with speed as the batch size grows. I'm using the batched pipeline in my tests.