YuanGongND / whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
BSD 2-Clause "Simplified" License
312 stars 25 forks source link

Support for whisper-large-v3 #21

Open spaghettiSystems opened 6 months ago

spaghettiSystems commented 6 months ago

Hello,

First of all, nice work!

Is it possible to release a checkpoint trained with whisper-large-v3? The reason I'm interested in this is that large-v3 is trained on a new dataset with 5 million hours of audio. I'm interested to see how that scaling will impact whisper-at.

Thank you.

YuanGongND commented 6 months ago

thanks for the suggestion, it is a very nice one.

I am currently busy on something else, so cannot do it immediately. What I can tell is:

1/ we provided code to train the model, extracting features from Whisper and do linear probing on ESC-50 should be relatively easy, and fast.

2/ compare v1 and v2, v1 has better general (non-speech) audio classification.

-Yuan