Thiagohgl / ai-pronunciation-trainer

This tool uses AI to evaluate your pronunciation.
GNU Affero General Public License v3.0
135 stars 38 forks source link

Support for other models #7

Open lorisgir opened 10 months ago

lorisgir commented 10 months ago

Hi, great work! Just wanna know if it's possible to use other models like whisper.

Thiagohgl commented 7 months ago

Hey, thanks!

Yes, as long as the model has a word-level timestamp you just need to make sure it follows the interface in IASRModel. Whisper in particular could be really interesting since it can tackle most mainstream languages with a single model.

I think I will actually implement it when I get some spare time (also on the online version) since many people seem to be interest on the tool, but may want to learn other languages.

jvel07 commented 3 months ago

Hi! congrats on the repo!!!

What would be the pipeline for the time stamps on Whisper? I assume this is not enough for this repo, right?:

prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
[{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
  'timestamp': (0.0, 5.44)}]

Taken from https://huggingface.co/openai/whisper-base

jvel07 commented 3 months ago

As for the TTS, I just found out it is actually not being used?

On models.py/getTTSModel() and on `AIModels.py/NeuralTTS().getAudioFromSentence() I changed the TTS model to Tacotron2. However, I didn't notice any change when playing the sample audio. I got the kinda robotic female voice.

Then, to double check if that is being used, I returned None in models.py/getTTSModel(). But this resulted in no error and the app continued to work. Hence, I assume the TTS part is not being actually used?

I found in callbacks.js the speech generation part with var synth = window.speechSynthesis;, is this what is generating the voice when playing the sample audio? How to change that to a custom TTS model?

Thanks in advance for your help, @Thiagohgl !