[Textgrid for dataset] - Githubissues

NTT123 / vietTTS

Vietnamese Text to Speech library

MIT License

201 stars 91 forks source link

[Textgrid for dataset] #1

Closed frank269 closed 3 years ago

frank269 commented 3 years ago

I am creating textgrid files for my dataset. Can you guide me how to create that file? Or you can give me information. Thank you so much!

NTT123 commented 3 years ago

Hi @frank269 , I used Montreal Forced Aligner to create textgrid files. Visit https://montreal-forced-aligner.readthedocs.io/en/latest/ for more information.

frank269 commented 3 years ago

thank for your response.

frank269 commented 3 years ago

I used MFA 2.0 to align the text, but it through the error. How can i generate lexicon for vietnamese? This is an error: montreal_forced_aligner.exceptions.PronunciationAcousticMismatchError: There were phones in the dictionary that do not have acoustic models: a, e, i, u, y, ê, ề

NTT123 commented 3 years ago

The pretrained acoustic model does not include these phones, in this case, you have to train your own acoustic model. See https://montreal-forced-aligner.readthedocs.io/en/latest/aligning.html#align-using-only-the-data-set for more information.

frank269 commented 3 years ago

I tried using the first 6 files and the lexicon file in infore data to align and train with the command: ./bin/mfa_train_and_align MFA/dataset MFA/lexicon.txt MFA/aligned But it only has the first file that has the correct textgrid file, and the other files that give the wrong data, Where did I go wrong?

NTT123 commented 3 years ago

To train a MFA model, you need: a lexicon file, a wav/text data directory. The wav/text data directory includes all your audio clips and the transcript files. Each A.wav clip requires a A.txt transcript file in the same directory.

frank269 commented 3 years ago

Yes, I used the first 6 files and lexicon file in the database infore you provided, I also manually created 6 transcription files for each audio clips. But when I run command train, the output of the first file is correct, but the other files are wrong, here are the results of the first 6 files: 1.zip

NTT123 commented 3 years ago

It is possible that your dataset is too small, so MFA cannot learn a useful model from that little data.

NTT123 commented 3 years ago

This is a notebook that I used to align InfoRe data with a slightly different phoneme set https://colab.research.google.com/gist/NTT123/95b12ca42a4bdd1a856aba0fbb0f8936/infore-mfa-tutorial.ipynb

frank269 commented 3 years ago

Oh, thank you so much!