MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.34k stars 247 forks source link

[BUG] training g2p takes endless time for japanese #498

Closed yw0nam closed 2 years ago

yw0nam commented 2 years ago

Debugging checklist

[ O ] Have you updated to latest MFA version? [ O ] Have you tried rerunning the command with the --clean flag?

Describe the issue A clear and concise description of what the bug is.

I generate custom dictionary for japanese using pororo. After generating custom dictionary, i tried to train g2p model, but the training took endless time.

I tried below code.

mfa train_g2p ./mfa/data/dictionary/my_dictionary.txt ./mfa/data/g2p/ja_g2p.zip --clean

캡처

It has been 30 minutes since I started training, but there is no change.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Japanese
    • How many files/speakers? 58475 / 19
    • Are you using lab files or TextGrid files for input?
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? No
    • If it's a custom dictionary, what is the phoneset? alphabet
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one?
    • If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA). pynini_train_g2p.log

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

yw0nam commented 2 years ago

According to the official document, I added --phonetisaurus flag to train_g2p and it worked well.