Closed Gautam-Rajeev closed 5 months ago
https://huggingface.co/openai/whisper-large-v3/discussions/71 https://github.com/huggingface/transformers/pull/28687
** this is just a bug we might face later on and just calling it out now .
https://huggingface.co/blog/fine-tune-whisper
Medium
https://ai4bharat.iitm.ac.in/shrutilipi/
https://ai4bharat.iitm.ac.in/indicsuperb/
https://ai4bharat.iitm.ac.in/indicvoices/#:~:text=INDICVOICES%20is%20a%20dataset%20of,Indian%20districts%20and%2022%20languages
Collating indic voices : https://colab.research.google.com/drive/1HeRjs7MxQngZRybnxldm1Yq5lqR6Qqvr#scrollTo=q82yn54YCD_e
https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/generate_tflite_from_whisper.ipynb
Its just a 40 MB quantized model - https://github.com/openai/whisper/discussions/506
@Jiya126
Hey, working on this issue
tuning is done , converting to tflite remaining, raising it as a separate ticket
Whisper has some small issue now with translating last to input language :
https://huggingface.co/openai/whisper-large-v3/discussions/71 https://github.com/huggingface/transformers/pull/28687
** this is just a bug we might face later on and just calling it out now .
We can fine tune whisper :
https://huggingface.co/blog/fine-tune-whisper
Medium
Dataset:
https://ai4bharat.iitm.ac.in/shrutilipi/
https://ai4bharat.iitm.ac.in/indicsuperb/
https://ai4bharat.iitm.ac.in/indicvoices/#:~:text=INDICVOICES%20is%20a%20dataset%20of,Indian%20districts%20and%2022%20languages
Collating indic voices : https://colab.research.google.com/drive/1HeRjs7MxQngZRybnxldm1Yq5lqR6Qqvr#scrollTo=q82yn54YCD_e
We can convert to TFlite model :
https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/generate_tflite_from_whisper.ipynb
Its just a 40 MB quantized model - https://github.com/openai/whisper/discussions/506