MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.34k stars 247 forks source link

[BUG] The Model is Not Recognizing Files/Extracting OOVs #523

Closed NataliaShmueli closed 1 year ago

NataliaShmueli commented 1 year ago

Debugging checklist

[x] Have you updated to latest MFA version? [x] Have you tried rerunning the command with the --clean flag?

Describe the issue A clear and concise description of what the bug is. The model is not recognizing the .mp3 dataset and is not extracting OOVs.

For Reproducing your issue Please fill out the following: Ran mfa validate on a corpus with --ignore_acoustics, and it would not recognize the dataset, despite it being a standard .mp3 dataset.

  1. Corpus structure
    • What language is the corpus in?
    • Arabic
    • How many files/speakers?
    • Nine
    • Are you using lab files or TextGrid files for input?
    • Lab
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one?
    • N/A
    • If it's a custom dictionary, what is the phoneset?
    • X-SAMPA
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one?
    • N/A
    • If it's a model you've trained, what data was it trained on?
    • Private self model

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

validate_training.log

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

NataliaShmueli commented 1 year ago

I was able to fix it. The database had transcriptions in UTF-16, not UTF-8. Sorry about that!