MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.26k stars 242 forks source link

[BUG] mfa g2p considers hidden files and other non-corpus files #817

Closed lars76 closed 1 week ago

lars76 commented 3 weeks ago

Debugging checklist

[x] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? [x] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version? [x] Have you tried rerunning the command with the --clean flag?

Describe the issue mfa g2p considers hidden files (.ipynb_checkpoints/). The statistics shows then an additional speaker (Found X speakers ...) Also, README files are considered as additional data.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? AISHELL-3
    • How many files/speakers? 218
    • Are you using lab files or TextGrid files for input? lab

Desktop (please complete the following information):