Open jpgard opened 1 year ago
Here are some more numbers for this same audio file:
(8 threads, 1 worker, 5 beams): 1037.374998s (4 threads, 1 worker, 5 beams): 768.986466s (8 threads, 8 workers, 1 beam): 742.796928s (4 threads, 1 worker, 1 beam): 704.093647s (4 threads, 4 workers, 1 beam): 665.327561s (4 threads, 1 worker, 1 beam, 500ms vad filter): 647.515117s (1 thread 1 worker, 5 beams): 787.795953s (1 thread, 4 workers, 5 beams): 846.496325s (1 thread, 4 workers, 1 beam): 797.238037s
You mention two CPUs: a dual core i5 and a 10-core Xeon. What are your results for? A dual core i5 is presumably something like an Intel i5-2520M laptop CPU, which has about 12x lower multi-thread performance than your Xeon 6230. I would expect vastly different numbers from those two CPUs.
Have you got models pre-downloaded so aren't measuring download time?
The example video is in French, I only use the English models so can't compare directly. The README benchmark headline result for CPU is 13 minutes audio in 2 minutes, so 6.5x realtime, for the Xeon on int8, beam size 5.
Some comparison benchmarks from my setup. Ryzen 5600G, small.en model, int8, beam size 5, 4 threads, this English youtube video: https://www.youtube.com/watch?v=GFu64hnqzVo takes 54s so 7.2x realtime. Other variations of above: Ryzen 5600G 1 thread 118 seconds, 2 threads 74 seconds. 4 threads on Ryzen 4500U is 78 seconds.
So yes, broadly I can reconcile the README benchmark for CPU performance.
I'm attempting to reproduce the benchmark numbers listed on the README, using the same audio.
The README indicates that I should be able to transcribe an MP3 file of the audio from this video using the small model, with fp32 and beam size 5, in around 2m44s (164s). However, when I transcribe that audio (using the script below, and the mp3 file I extracted from the video here) it takes 1037s, around 6x slower.
It's hard to know the exact details of how that benchmark was computed, though, because there is no script or audio file provided to reproduce it. I'm also not sure if any other hyperparameters/configurations were changed to achieve that result. However, it is concerning that I'm not able to reproduce anything close to this number, despite trying it on two different CPUs (2.3 GHz Dual-Core Intel Core i5, Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz).
Thank you!