MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.27k stars 242 forks source link

[BUG] #687

Closed jeffmielke closed 10 months ago

jeffmielke commented 11 months ago

Debugging checklist

[x] Have you updated to latest MFA version? [x] Have you tried rerunning the command with the --clean flag?

Describe the issue Thanks for adding align_one. I haven't been able to get it to work yet. My test files have been successfully aligned with mfa align lots of times, including with 3.0.0a4.

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? English
    • How many files/speakers? 1/1
    • Are you using lab files or TextGrid files for input? text file (I get the same result with a textgrid)
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? english_us_arpa
    • If it's a custom dictionary, what is the phoneset? n/a
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? english_us_arpa
    • If it's a model you've trained, what data was it trained on? n/a

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

I don't think a log file was produced, but here is the relevant part of command_history.yaml:

Desktop (please complete the following information):

Additional context Here is the error message:

(aligner) jimielke@t-phon-4:/phon/ENG536/jimielke$ mfa align_one --clean ../files/jeff_vowelplot/jeff.wav ../files/jeff_vowelplot/jeff.txt english_us_arpa english_us_arpa beckymfa_one Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7eff83bb0f50>> Traceback (most recent call last): File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 99, in history_save_handler raise self.exception File "/home/jimielke/.conda/envs/aligner/bin/mfa", line 10, in sys.exit(mfa_cli()) ^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1157, in call return self.main(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/rich_click/rich_group.py", line 21, in main rv = super().main(args, standalone_mode=False, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/command_line/align_one.py", line 158, in align_one_cli ctm = align_utterance_online(acoustic_model, utt, lexicon_compiler, align_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/online/alignment.py", line 48, in align_utterance_online fst = graph_compiler.compile_fst(utterance.transcript) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/kalpy/decoder/training_graphs.py", line 279, in compile_fst fst = self.compiler.CompileGraphFromText(transcript_symbols) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: kaldi::KaldiFatalError

mmcauliffe commented 11 months ago

Couple of questions for debugging:

  1. What were the contents of the text file?
  2. In the folder ~/Documents/MFA/extracted_models/dictionary/english_us_arpa, do phones.txt and words.txt look right?
  3. Is L.fst bigger than 1KB?
jeffmielke commented 11 months ago
  1. Here are the contents of the text file (Erik Thomas's OHDARE story with punctuation stripped out in case that was the problem, but that didn't change it):

Since it was too cold Saturday to soak in his pool and too foggy to shoot arrows Sam decided to tour the countryside His golf partner Pike was gone and his daughter was playing the fife which made his ears throb As a result he wanted to get out of the house so he dashed to the family car before his wife Joan caught him He still hadn't started washing the dishes from last night He took off as fast as he could and hit the culvert as he pulled out of his drive I can't stay cooped up inside on such a nice day he thought Watching the telephone poles zip by he passed both a school and a hospital After five minutes he drove around Hoover Dam where he saw a sight to behold there must've been a thousand seagulls eating dead fish On Friday all he'd seen were men pushing a boat through the water along the end of the dam Next he rode through some farms with soybean fields and the ragweed in the dust filled air was so bad it made him cough He'd be a fool if he didn't have the sense to avoid those plants They made his voice sound hoarse Not far away he heard a dog barking Maybe the ragweed bothered it too At least August was peaceful He enjoyed looking at a hawk above and the hogs horses cows and a bull this Saturday morning and now he felt refreshed Even a goat chewing on a tin can looked happy He headed back to town There he passed Cooper's Forks where he'd renewed his medical insurance on Tuesday Checking his pocket for cash he stopped at a Gulf station and bought some gas because Joan wanted a full tank You don't have any driveway salt do you he asked the manager No but Tharp's tool store ought to sell ten pound bags cheap the manager answered I only need five right now he nodded plus some hooks bolts and a bushel basket He rushed to the store which was having a special on light bulbs and first he bought those things and then second two cots on sale With the spare tire they were a tight fit in his small sized foreign auto On the way home he saw a guy fixing the roof of his house and thought I need to put a fall coat of paint on my own house it looks dull The tar on the road had gone from hard to soft since it had gotten hot and he recalled that Joan would be hostile if he didn't wash that big dark cooking pot on the stove Picturing the fire in her eyes he took a shortcut down Tuttle Street and got home just in time to hear his other daughter practice the violin She sounded as horrible as her sister Oh my poor ears he muttered as he poured out the dish soap I guess there's no cure for this

  1. I think phones.txt and words.txt do look right

  2. Yes, L.fst is 46MB

Jeff Mielke Professor Linguistics program Department of English North Carolina State University he, him, his

On Sun, Sep 10, 2023 at 1:20 PM Michael McAuliffe @.***> wrote:

Couple of questions for debugging:

  1. What were the contents of the text file?
  2. In the folder ~/Documents/MFA/extracted_models/dictionary/english_us_arpa, do phones.txt and words.txt look right?
  3. Is L.fst bigger than 1KB?

— Reply to this email directly, view it on GitHub https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/687#issuecomment-1712886504, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3Q3BNRWGJTTWG7ILMLUCLXZXZELANCNFSM6AAAAAA4O6MKAA . You are receiving this because you authored the thread.Message ID: @.*** com>

mmcauliffe commented 11 months ago

Ok I have 3.0.0a5 up, and I tested out the transcript with english_us_arpa and it was working locally. If 3.0.0a5 is throwing the same error, can you try redownloading the english_us_arpa dictionary/model to make sure they're at the latest version (and running with --clean)?

jeffmielke commented 11 months ago

Thanks. I don't get the error messages anymore, but I do get the message "Maximum timestamp in Textgrid changed from (167.87428571428572) to (167.874286)" and it completes without showing any other messages, and doesn't produce output. I used the clean flag, and I got the same result before and after re-downloading the dictionary and acoustic models.

Jeff Mielke Professor Linguistics program Department of English North Carolina State University he, him, his

On Wed, Sep 13, 2023 at 12:11 PM Michael McAuliffe @.***> wrote:

Ok I have 3.0.0a5 up, and I tested out the transcript with english_us_arpa and it was working locally. If 3.0.0a5 is throwing the same error, can you try redownloading the english_us_arpa dictionary/model to make sure they're at the latest version (and running with --clean)?

— Reply to this email directly, view it on GitHub https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/687#issuecomment-1717928138, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3Q3BJGKFOTG2KLJDRILDTX2HLJ5ANCNFSM6AAAAAA4O6MKAA . You are receiving this because you authored the thread.Message ID: @.*** com>

mmcauliffe commented 11 months ago

By not producing output, you mean the output TextGrid file doesn't exist? It does sound like it's gotten to the TextGrid writing stage if it has that warning. If the output TextGrid isn't there, can you try rerunning with using "-" instead of the output path and see if it produces any output?

jeffmielke commented 11 months ago

Yes, the output TextGrid file doesn't exist. Here is what I get with "-":

mfa align_one --clean ../files/jeff_vowelplot/jeff.wav jeff.txt english_us_arpa english_us_arpa - Please be aware that you are running an alpha version of MFA. If you would like to install a more stable version, please visit https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html#installing-older-versions-of-mfa Maximum timestamp in Textgrid changed from (167.87428571428572) to (167.874286) Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7f29835eb050>> Traceback (most recent call last): File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 100, in history_save_handler raise self.exception File "/home/jimielke/.conda/envs/aligner/bin/mfa", line 10, in sys.exit(mfa_cli()) ^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1157, in call return self.main(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/rich_click/rich_group.py", line 21, in main rv = super().main(args, standalone_mode=False, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/command_line/align_one.py", line 163, in align_one_cli file_ctm.export_textgrid( File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/kalpy/gmm/data.py", line 176, in export_textgrid textgridStr = tgio.textgrid_io.getTextgridAsStr( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/praatio/utilities/textgrid_io.py", line 211, in getTextgridAsStr tg = _prepTgForSaving( ^^^^^^^^^^^^^^^^^ File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/praatio/utilities/textgrid_io.py", line 301, in _prepTgForSaving _fillInBlanks(tier, "", minTimestamp, maxTimestamp) File "/home/jimielke/.conda/envs/aligner/lib/python3.11/site-packages/praatio/utilities/textgrid_io.py", line 132, in _fillInBlanks raise errors.ParsingError( praatio.utilities.errors.ParsingError: The entries are longer than the max time specified in the textgrid.

Jeff Mielke Professor Linguistics program Department of English North Carolina State University he, him, his

On Thu, Sep 14, 2023 at 3:27 PM Michael McAuliffe @.***> wrote:

By not producing output, you mean the output TextGrid file doesn't exist? It does sound like it's gotten to the TextGrid writing stage if it has that warning. If the output TextGrid isn't there, can you try rerunning with using "-" instead of the output path and see if it produces any output?

— Reply to this email directly, view it on GitHub https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/687#issuecomment-1720020851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3Q3BOHEH5CWU3OTSP5UD3X2NLBZANCNFSM6AAAAAA4O6MKAA . You are receiving this because you authored the thread.Message ID: @.*** com>

mmcauliffe commented 11 months ago

Can you try updating to latest kalpy via conda update -c conda-forge kalpy? 0.5.6 should fix the textgrid parsing error.

jeffmielke commented 11 months ago

I realize now that it started working with 3.0.0a5 and from then on the only problem was that I interpreted OUTPUT_PATH as the name of a directory. I thought it was creating an empty subdirectory, but it was creating a textgrid that I didn't recognize as a textgrid because I wasn't telling it to name it .TextGrid. It was making textgrids before and after I updated kalpy. I just didn't notice before. I'm sorry for making you do extra work.

Jeff

Jeff Mielke Professor Linguistics program Department of English North Carolina State University he, him, his

On Wed, Sep 20, 2023 at 1:18 PM Michael McAuliffe @.***> wrote:

Can you try updating to latest kalpy via conda update -c conda-forge kalpy? 0.5.6 should fix the textgrid parsing error.

— Reply to this email directly, view it on GitHub https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/687#issuecomment-1728143548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3Q3BNFGMD2HSU7WJUEYNTX3MQOZANCNFSM6AAAAAA4O6MKAA . You are receiving this because you authored the thread.Message ID: @.*** com>

mmcauliffe commented 11 months ago

Ahhh that makes sense! No worries, I'll add some extra logic for detecting when OUTPUT_PATH is a directory and generate the output TextGrid file name from the input wav file name, since it's a logical way to transition from the mfa align command.

jeffmielke commented 11 months ago

In my case it wasn't already a directory, I just had it in my head that I was telling it what directory to put the output in, so I didn't understand what I was looking at. Maybe not a lot of other people will make the same mistake.

Jeff Mielke Professor Linguistics program Department of English North Carolina State University he, him, his

On Wed, Sep 20, 2023 at 3:18 PM Michael McAuliffe @.***> wrote:

Ahhh that makes sense! No worries, I'll add some extra logic for detecting when OUTPUT_PATH is a directory and generate the output TextGrid file name from the input wav file name, since it's a logical way to transition from the mfa align command.

— Reply to this email directly, view it on GitHub https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/687#issuecomment-1728297859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3Q3BMB6GTALIEATL3ODJ3X3M6OXANCNFSM6AAAAAA4O6MKAA . You are receiving this because you authored the thread.Message ID: @.*** com>