futo-org / whisper-acft

MIT License
74 stars 3 forks source link

Quality suffers on earnings22 dataset #5

Open soupslurpr opened 2 months ago

soupslurpr commented 2 months ago

whisper-tiny.en gets 18 WER without dynamic audio context on https://huggingface.co/datasets/distil-whisper/earnings22 (chunked, test) using evaluation.ipynb while acft-whisper-tiny.en with dynamic audio context gets 318 WER. This indicates that the acft fine tuned model with dynamic audio context may not work well in real-world conditions which include diverse accents and varying speech conditions.

soupslurpr commented 2 months ago

Not sure why but changing ADD_AUDIO_CTX to 64 makes acft-whisper-tiny.en achieve 19 WER on earnings22.

stopthinking102 commented 3 weeks ago

can u share which parameter needs to be set in whisper wparams.audio_ctx = 1500; to use this model.