[BUG] --speaker_characters results in KeyError

thealk commented 1 year ago

Debugging checklist

[ X] Have you updated to latest MFA version? [ X] Have you tried rerunning the command with the --clean flag?

Describe the issue When running mfa align with the --speaker_characters (-s) flag for speaker adaptation, alignment fails. The KeyError is the id for a single speaker. This error also occurs for mfa validate.

(aligner) thea MFA % mfa align -s 4 --clean /Users/thea/Documents/0_test english_us_arpa english_us_arpa /Users/thea/Documents/1_test_out

File name formats are, for example: oc01_[bla...].wav/.TextGrid, where the first 4 characters denote the speaker ID.

mfa align and validate work as expected when --speaker_characters flag is omitted.

Command: mfa align --speaker_characters 4 --clean /Users/thea/Documents/0_test english_us_arpa english_us_arpa /Users/thea/Documents/1_test_out

Output:

 INFO     Setting up corpus information...                                      
 INFO     Loading corpus from source files...                                   
   1% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/100  [ 0:00:02 < -:--:-- , ? it/s ]
 INFO     Stopped parsing early (0.0847590000000018 seconds)                    
 ERROR    There was an error in the run, please see the log.                    
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x18bfbbbd0>>
Traceback (most recent call last):
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 98, in history_save_handler
    raise self.exception
  File "/Users/thea/miniconda3/envs/aligner/bin/mfa", line 10, in <module>
    sys.exit(mfa_cli())
             ^^^^^^^^^
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/rich_click/rich_group.py", line 21, in main
    rv = super().main(*args, standalone_mode=False, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/command_line/align.py", line 113, in align_corpus_cli
    aligner.align()
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 412, in align
    self.setup()
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/alignment/pretrained.py", line 205, in setup
    self.load_corpus()
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/corpus/acoustic_corpus.py", line 1209, in load_corpus
    self._load_corpus()
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/corpus/base.py", line 1288, in _load_corpus
    self._load_corpus_from_source_mp()
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/corpus/acoustic_corpus.py", line 1023, in _load_corpus_from_source_mp
    import_data.add_objects(self.generate_import_objects(file))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/thea/miniconda3/envs/aligner/lib/python3.11/site-packages/montreal_forced_aligner/corpus/base.py", line 1018, in generate_import_objects
    "speaker_id": self._speaker_ids[u.speaker_name],
                  ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'oc01'

For Reproducing your issue Please fill out the following:

Corpus structure
- What language is the corpus in? English
- How many files/speakers? Test case has 2 speakers and 5 files each (10 files in total)
- Are you using lab files or TextGrid files for input? TextGrids
Dictionary
- Are you using a dictionary from MFA? If so, which one? Yes, english_us_arpa
- If it's a custom dictionary, what is the phoneset?
Acoustic model
- If you're using an acoustic model, is it one download through MFA? If so, which one? Yes, english_us_arpa
- If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA). 0_test.log

Desktop (please complete the following information):

OS: [e.g. Windows, OSX, Linux] OSX
Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc] MacOSX 12.6.1
Any other details about the setup (Cloud, Docker, etc)

Additional context

I was messing around with python installs earlier today, but miniconda3 seems to be properly installed and mfa align works as expected, which seem to rule that out. - - When I ran mfa align on the whole data set (all files in a single directory) without specifying speaker adaptation it automatically detected 2 speakers. There are actually ~30 speakers (oc01, oc02, pd01, pd02, etc), but 2 speaker "groups", oc## and pd##. Not sure if this is relevant. A test folder with a couple oc01 and pd01 files only detected a single speaker, so it must have detected a different string for speaker id by default...
Note that the error was identical on a set of files with longer names too (for example: original file names were caspd_v1_oc01..., where the first 13 characters denoted the unique speaker ID, but also contained project and version info in the name. I simplified the filenames thinking perhaps extra underscores were throwing an error).
I ran into the same error on another small corpus with 20 speakers, where each file was just the 4 character speaker id (e.g., DM91.wav).
I have attempted this on two computers; my current laptop (MacOSX 12.6.1) is an M1 Pro, which I know can sometimes generate problems, but I get the exact same output on an older laptop with updated miniconda3 and mfa installs (MacOSX 12.4 with Intel Core i5)
Thank you so much as always for your insights!

thealk commented 1 year ago

Some additional things I've tried that result in the same errors now also include:

renaming the transcription tier to match each of the speaker IDs
-s flag with more mfa commands: mfa align, validate, adapt all have the same issue
tested on another computer running Mac OSX 13.0 with an Apple M2 Pro chip with a brand new installation of MFA (and after running mfa validate)
tested on previous mfa version 2.2.5

thealk commented 1 year ago

Another update: Another issue apparently was that speaker adaptation was just not detecting my distinct speakers correctly at all (or at least not in a way I understand). That is, when I ran mfa align (without specifying -s), it would detect 2 speakers consistently, no matter if there were 10 different filenames, or if the files were in distinct directories.

Revising my previous updated because the original solution I came up with (concatenate files and assign one tier per speaker, named with speaker ID) was more complicated than necessary. I erroneously thought my simpler solution, below, wasn't working (just rename your tiers in your original transcript textgrids to match the speaker IDs), but I must have been doing something wrong in my initial tests because now it's working.

The ULTIMATE solution, which I am now happy with, is to make sure your transcription tier in EACH FILE has a name corresponding to the speaker. For a larger corpus with many files per speaker, this means every file for a given speaker has the same tier name, but no two speakers have the same tier name. For a smaller corpus with one file per speaker, each file just contains a tier name that matches the file name.

For anyone working in Praat, I wrote a tiny script to rename tiers according to speaker_chars in the filenames: https://github.com/thealk/PraatScripts/blob/master/mfa_prep/rename_tiers_to_speaker.praat

I am leaving this issue open because the original problem, namely that the -s flag consistently results in an error, is still an unresolved issue. However, this workaround is easy enough to implement.

MontrealCorpusTools / Montreal-Forced-Aligner

[BUG] --speaker_characters results in KeyError #669