Hugging Face pretrained models integration

abdeladim-s / subsai

🎞️ Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants 🎞️

https://abdeladim-s.github.io/subsai/

GNU General Public License v3.0

1.15k stars 96 forks source link

Hugging Face pretrained models integration #128

Open mkatic007 opened 2 months ago

mkatic007 commented 2 months ago

Could you please explain how to add a Hugging Face pretrained model to work with your solution?

abdeladim-s commented 2 months ago

@mkatic007, I've added the hugging face implementation to the supported models. You can use any pretrained model from the hub as long as it is compatible with the Automatic Speech Recognition task. Please give it a try and let me know if you find any issues.

mkatic007 commented 2 months ago

Thank you! I tried with: subsai D:/TranSource/03.mp3 --model japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large --model-configs "{\"model_type\": \"large-v3\"}" --format srt -tm mbart50 -tsl japanese -ttl english But it gives the error: return AVAILABLE_MODELS[model_name]'class' KeyError: 'japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large' I did not download the model from HF, so I am not sure if I am missing any steps :) Please be so kind as to instruct me on what to do.

abdeladim-s commented 2 months ago

The command should look like:

subsai D:/TranSource/03.mp3 --model HuggingFaceModel  --model-configs "{"model_id": "japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large"}" --format srt -tm mbart50 -tsl japanese -ttl english

mkatic007 commented 2 months ago

Thank you, I tried but now I am getting this error: "json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)".