Open AnnCod opened 2 months ago
Hi, @AnnCod , I think it can work on non-English languages. We tested this solution on Chinese speeches before, and we got a good result, though the audio quality is not good. I supposed it could because:
If you want to get a decent audio quality, you may try to use pretrained models trained on multilingual corpus like XLS-R and then train a vocoder with your target language.
Thanks for the reply. Is this demo working correctly? I have some errors while trying to run it on colab.
Sorry, I accidentally misspelled a variable name, fixed by https://github.com/BakerBunker/SALT/commit/8060405da51996c0b8b47a5b8c2babad0838b14a
but there's still an error "RuntimeError: The size of tensor a (23866) must match the size of tensor b (214) at non-singleton dimension 2"
I can't reproduce this error, would you consider share your colab notebook with these output?
Hi,
Do you think that this solution can be adapted easily to work on different languages than English?