futo-org / whisper-acft

MIT License
62 stars 2 forks source link

large models #4

Closed eschmidbauer closed 6 hours ago

eschmidbauer commented 3 weeks ago

Hello- Thank you for sharing this work. I've noticed how well these models work with smaller audio files. Is there any plans to release the large models? Thanks

eschmidbauer commented 2 weeks ago

followup- i finetuned whisper large-v3 using the script in this repo and short audio clips still suffer from same issue. is it possible that the large-v3 needs more training? If so, can you share some details on how to do that? Thanks again.

thiswillbeyourgithub commented 2 weeks ago

Hi, do you plan on sharing how you did? Especially GPU requirements, scripts, time needed, etc? Also what's your opinion on quantization aware fine tuning? IIRC it can potentially greatly reduce the size, increase the speed and at no additional computing cost.

eschmidbauer commented 6 hours ago

https://gist.github.com/eschmidbauer/c1bb441028a61db19d833a289688e8f6 slightly modified script provided in repo

thiswillbeyourgithub commented 6 hours ago

Thanks! Do you plan on sharing the model? I'm also interested in:

GPU requirements, time needed,

and

Also what's your opinion on quantization aware fine tuning? IIRC it can potentially greatly reduce the size, increase the speed and at no additional computing cost.