Closed junchen6072 closed 1 year ago
Tried to use a thread pool in python to submit jobs for audios,
That's a good approach. Did you also increase num_workers
when doing that? Normally this should overlap kernel executions on the GPU and increase the usage.
Tried to use a thread pool in python to submit jobs for audios,
That's a good approach. Did you also increase
num_workers
when doing that? Normally this should overlap kernel executions on the GPU and increase the usage.
Yes I did. I think the bottleneck may be more in the python code, we're blocking wait on self.model.generate
Another observation is, using 2 threads in the pool seems better than more
Are you using word_timestamps=True
?
Yes, this is slow?
Yes, it's slower than the default transcription mode (see #45).
And some operations are indeed running on the CPU in this mode which explains the lower GPU usage. There could be further improvements in the future.
I see, thanks! Is the cpu part mostly on faster-whisper, or CTranslate2?
It's probably a contribution of both, but I don't know exactly.
Taking the OpenAI implementation as a reference, the following lines are run on CPU in CTranslate2:
https://github.com/openai/whisper/blob/v20230314/whisper/timing.py#L208-L214
These steps could benefit from a GPU implementation but I would need some time to come up with an efficient implementation. My first attempt had worse performance than the CPU version!
Higher GPU usage would probably come from some form of batch execution. This is discussed in #59.
First, thank you for this awesome work and it indeed improves the transcribe time a lot! But I'm wondering if it's possible to push to even higher gpu usage so it can be even faster? From my testing to transcribe a few audios whose length is between 2mins to 15mins, gpu usage is jumping between 70-90%, and occasionally drop to quite low. Tried to instantiate WhisperModel with higher cpu_threads and num_workers, but it doesn't seem to help? I guess there're some non trivial blocking cpu computation so gpu is not fully utilized. Tried to use a thread pool in python to submit jobs for audios, it has a bit improvement, the peak gpu usage can go higher, but I think on average it didn't increase too much.
Any ideas? Thanks!