Open lorisgir opened 10 months ago
Hey, thanks!
Yes, as long as the model has a word-level timestamp you just need to make sure it follows the interface in IASRModel. Whisper in particular could be really interesting since it can tackle most mainstream languages with a single model.
I think I will actually implement it when I get some spare time (also on the online version) since many people seem to be interest on the tool, but may want to learn other languages.
Hi! congrats on the repo!!!
What would be the pipeline for the time stamps on Whisper? I assume this is not enough for this repo, right?:
prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
[{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
'timestamp': (0.0, 5.44)}]
Taken from https://huggingface.co/openai/whisper-base
As for the TTS, I just found out it is actually not being used?
On models.py/getTTSModel()
and on `AIModels.py/NeuralTTS().getAudioFromSentence() I changed the TTS model to Tacotron2. However, I didn't notice any change when playing the sample audio. I got the kinda robotic female voice.
Then, to double check if that is being used, I returned None
in models.py/getTTSModel()
. But this resulted in no error and the app continued to work. Hence, I assume the TTS part is not being actually used?
I found in callbacks.js the speech generation part with var synth = window.speechSynthesis;
, is this what is generating the voice when playing the sample audio? How to change that to a custom TTS model?
Thanks in advance for your help, @Thiagohgl !
Hi, great work! Just wanna know if it's possible to use other models like whisper.