SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.26k stars 1.03k forks source link

Fair Benchmarking of Faster-Whisper - Parameter equivalents to Hugginface #993

Open asusdisciple opened 2 months ago

asusdisciple commented 2 months ago

I want to benchmark faster-whisper and some pipeline whisper implementations of whisper in huggingface. For the sake of fairness I would like to parametrize the models as equally as possible.

In HF you have different generation possibilities which are:

greedy decoding if num_beams=1 and do_sample=False contrastive search if penalty_alpha>0 and top_k>1 multinomial sampling if num_beams=1 and do_sample=True beam-search decoding if num_beams>1 and do_sample=False beam-search multinomial sampling if num_beams>1 and do_sample=True diverse beam-search decoding if num_beams>1 and num_beam_groups>1

How would I for example reproduce greedy decoding in faster-whisper? Is there a do_sample parameter? Should I set best_of = 1 and beam_size = 1? Also in case I set do_sample = True in HF would that be equal to setting best_of = 5? Maybe you can share some insights with me, best case I want to reproduce all of the above strategies.

Best regards

MahmoudAshraf97 commented 1 month ago

@BBC-Esq and I are currently working on this, check #974

BBC-Esq commented 1 month ago

I'll send an invite to the repo if he wants to help out or just kibitz. Like @MahmoudAshraf97 I've been inundated with other stuff but do plan to get back to the benchmarking in the very near future.