[BUG] oovs files not generated correctly (and then deleted)

Debugging checklist

[X] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? [X] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version? running version 3.1.1 [X] Have you tried rerunning the command with the --clean flag?

Describe the issue OOV files are being deleted after validation is complete; also, they themselves seem incomplete. I have 2 mini corpora of mp3s in Romanian and Greek (from https://www.omniglot.com/language/phrases/romanian.php and https://www.omniglot.com/language/phrases/greek.php) consisting of 15 sound files and their transcriptions. The problem exists for both of them, but I'll just relate it for Romanian.

Running mfa validate "PATH\MFA\Miniromanian" romanian_cv romanian_cv, I get messages saying

INFO     Out of vocabulary words
 WARNING  15 OOV word types
 WARNING  37total OOV tokens
 WARNING  For a full list of the word types, please see: PATH\MFA\Miniromanian\oovs_found.txt.
          For a by-utterance breakdown of missing words, see:
          PATH\MFA\Miniromanian2\utterance_oovs.txt

These files aren't written as described. "oovs_found.txt" doesn't exist at all. There are three temp files that exist only as long as it takes for the model to complete its training, named "oov_counts_romanian_cv.txt", "oovs_found_romanian_cv.txt" and "utterance_oovs.txt". These files are deleted when training is complete.

I ran it again and copied these files to another directory to save them from deletion. When I opened them, they only had 5 lines each, rather than the 15 or 37 I'd've expected from the message.

For Reproducing your issue Please fill out the following:

Corpus structure
- What language is the corpus in? Romanian (applied to Greek too)
- How many files/speakers? 15 files, 1 speaker
- Are you using lab files or TextGrid files for input? .txt files ... is that okay?
Dictionary
- Are you using a dictionary from MFA? If so, which one? romanian_cv
- If it's a custom dictionary, what is the phoneset? N/A
Acoustic model
- If you're using an acoustic model, is it one download through MFA? If so, which one? romanian_cv
- If it's a model you've trained, what data was it trained on? N/A

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA). Miniromanian2.log

Desktop (please complete the following information):

OS: Windows
Version: Windows 10
Any other details about the setup (Cloud, Docker, etc) -- none that I can think of. I installed the MFA last week over conda.

MontrealCorpusTools / Montreal-Forced-Aligner

[BUG] oovs files not generated correctly (and then deleted) #819