Closed hailthedawn closed 1 year ago
My align.1.log file:
'C:\Users\Ketaki\anaconda3\envs\aligner\Library\bin\gmm-boost-silence.EXE' --boost=1.0 1 'temp\temp_mfa\alignment\final.mdl' -
'C:\Users\Ketaki\anaconda3\envs\aligner\Library\bin\gmm-align-compiled.EXE' --transition-scale=1.0 --acoustic-scale=0.083333 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false '--write-per-frame-acoustic-loglikes=ark:temp\temp_mfa\alignment\like.1.1.ark' - 'ark,s,cs:temp\temp_mfa\alignment\fsts.1.1.ark' 'ark,s,cs:add-deltas scp,s,cs:"temp\temp_mfa\temp_mfa\split2\feats.1.1.scp" ark:- |' 'ark:temp\temp_mfa\alignment\ali.1.1.ark' ark,t:-
WARNING (gmm-boost-silence.EXE[5.5.1016]:main():gmmbin\gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence.EXE[5.5.1016]:main():gmmbin\gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence.EXE[5.5.1016]:main():gmmbin\gmm-boost-silence.cc:103) Wrote model to -
add-deltas 'scp,s,cs:temp\temp_mfa\temp_mfa\split2\feats.1.1.scp' ark:-
LOG (gmm-align-compiled.EXE[5.5.1016]:main():gmmbin\gmm-align-compiled.cc:127) 1-1
WARNING (gmm-align-compiled.EXE[5.5.1016]:kaldi::AlignUtteranceWrapper():decoder\decoder-wrappers.cc:617) Retrying utterance 1-1 with beam 40
WARNING (gmm-align-compiled.EXE[5.5.1016]:kaldi::AlignUtteranceWrapper():decoder\decoder-wrappers.cc:626) Did not successfully decode file 1-1, len = 25547
LOG (gmm-align-compiled.EXE[5.5.1016]:main():gmmbin\gmm-align-compiled.cc:135) Overall log-likelihood per frame is -nan(ind) over 0 frames.
LOG (gmm-align-compiled.EXE[5.5.1016]:main():gmmbin\gmm-align-compiled.cc:137) Retried 1 out of 1 utterances.
LOG (gmm-align-compiled.EXE[5.5.1016]:main():gmmbin\gmm-align-compiled.cc:139) Done 0, errors on 1
Ah, right, can you try rerunning with a higher beam width mfa align ... --beam 100
and see if it succeeds? 4 minutes a bit on the long side, but 100 beam width should be enough (you can also boost it even higher, the default is 10, which is pretty strict, but faster)
Thanks for the reply!
No, I'm getting the same issue when I run:
mfa align -t ./temp -j 2 ./temp_mfa modified_librispeech-lexicon.txt ./english.zip ./ljs_alignedcapstone --beam 100
I even tried 300 but no go.
Hmm, can you try running mfa download acoustic english_us_arpa
and mfa download dictionary english_us_arpa
and then try running mfa align -t ./temp -j 2 ./temp_mfa english_us_arpa english_us_arpa ./ljs_alignedcapstone --beam 100
? The MFA 1.0 model that notebook uses is a while out of date and there's been a number of improvements to the models and dictionaries since then.
Downloaded the model and dictionary, and am now getting this error:
$ mfa align -t ./temp -j 2 ./temp_mfa english_us_arpa english_us_arpa ./ljs_alignedcapstone --beam 100
WARNING The previous run had a different configuration than the current, which may cause issues. Please see the log for details or use --clean flag if issues
are encountered.
WARNING The previous run had a different configuration than the current, which may cause issues. Please see the log for details or use --clean flag if issues
are encountered.
ERROR There was an error in the run, please see the log.
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x00000263365E6410>>
Traceback (most recent call last):
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\command_line\mfa.py", line 97, in history_save_handler
raise self.exception
File "C:\Users\Ketaki\anaconda3\envs\aligner\Scripts\mfa-script.py", line 10, in <module>
sys.exit(mfa_cli())
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\rich_click\rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\command_line\align.py", line 113, in align_corpus_cli
aligner.align()
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\alignment\pretrained.py", line 411, in align
self.setup()
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\alignment\pretrained.py", line 205, in setup
self.load_corpus()
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\corpus\acoustic_corpus.py", line 1205, in load_corpus
self.dictionary_setup()
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\dictionary\multispeaker.py", line 555, in dictionary_setup
conn.execute(sqlalchemy.insert(Word.__table__), word_objs)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\sqlalchemy\engine\base.py", line 1414, in execute
return meth(
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\sqlalchemy\sql\elements.py", line 486, in _execute_on_connection
return connection._execute_clauseelement(
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\sqlalchemy\engine\base.py", line 1638, in _execute_clauseelement
ret = self._execute_context(
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\sqlalchemy\engine\base.py", line 1837, in _execute_context
return self._exec_insertmany_context(
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\sqlalchemy\engine\base.py", line 2103, in _exec_insertmany_context
self._handle_dbapi_exception(
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\sqlalchemy\engine\base.py", line 2326, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\sqlalchemy\engine\base.py", line 2100, in _exec_insertmany_context
dialect.do_execute(cursor, sub_stmt, sub_params, context)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\sqlalchemy\engine\default.py", line 748, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "word_pkey"
DETAIL: Key (id)=(1) already exists.
[SQL: INSERT INTO word (id, mapping_id, word, count, word_type, dictionary_id) VALUES (%(id__0)s, %(mapping_id__0)s, %(word__0)s, %(count__0)s, %(word_type__0)s, %(dictionary_id__0)s), (%(id__1)s, %(mapping_id__1)s, %(word__1)s, %(count__1)s, %(word_type__ ... 110068 characters truncated ... 9)s, %(mapping_id__999)s, %(word__999)s, %(count__999)s, %(word_type__999)s, %(dictionary_id__999)s)]
[parameters: {'word__0': '<eps>', 'word_type__0': 'silence', 'count__0': 0, 'id__0': 1, 'mapping_id__0': 0, 'dictionary_id__0': 2, 'word__1': "'d", 'word_type__1': 'clitic', 'count__1': 0, 'id__1': 2, 'mapping_id__1': 1, 'dictionary_id__1': 2, 'word__2': "'ll", 'word_type__2': 'clitic', 'count__2': 0, 'id__2': 3, 'mapping_id__2': 2, 'dictionary_id__2': 2, 'word__3': "'re", 'word_type__3': 'clitic', 'count__3': 0, 'id__3': 4, 'mapping_id__3': 3, 'dictionary_id__3': 2, 'word__4': "'s", 'word_type__4': 'clitic', 'count__4': 0, 'id__4': 5, 'mapping_id__4': 4, 'dictionary_id__4': 2, 'word__5': "'ve", 'word_type__5': 'clitic', 'count__5': 0, 'id__5': 6, 'mapping_id__5': 5, 'dictionary_id__5': 2, 'word__6': 'a', 'word_type__6': 'speech', 'count__6': 0, 'id__6': 7, 'mapping_id__6': 6, 'dictionary_id__6': 2, 'word__7': "a''s", 'word_type__7': 'speech', 'count__7': 0, 'id__7': 8, 'mapping_id__7': 7, 'dictionary_id__7': 2, 'word__8': "a'body", 'word_type__8': 'speech' ... 5900 parameters truncated ... 'mapping_id__991': 991, 'dictionary_id__991': 2, 'word__992': 'achiever', 'word_type__992': 'speech', 'count__992': 0, 'id__992': 993, 'mapping_id__992': 992, 'dictionary_id__992': 2, 'word__993': 'achievers', 'word_type__993': 'speech', 'count__993': 0, 'id__993': 994, 'mapping_id__993': 993, 'dictionary_id__993': 2, 'word__994': 'achieves', 'word_type__994': 'speech', 'count__994': 0, 'id__994': 995, 'mapping_id__994': 994, 'dictionary_id__994': 2, 'word__995': 'achieving', 'word_type__995': 'speech', 'count__995': 0, 'id__995': 996, 'mapping_id__995': 995, 'dictionary_id__995': 2, 'word__996': 'achill', 'word_type__996': 'speech', 'count__996': 0, 'id__996': 997, 'mapping_id__996': 996, 'dictionary_id__996': 2, 'word__997': 'achillas', 'word_type__997': 'speech', 'count__997': 0, 'id__997': 998, 'mapping_id__997': 997, 'dictionary_id__997': 2, 'word__998': 'achille', 'word_type__998': 'speech', 'count__998': 0, 'id__998': 999, 'mapping_id__998': 998, 'dictionary_id__998': 2, 'word__999': "achille's", 'word_type__999': 'speech', 'count__999': 0, 'id__999': 1000, 'mapping_id__999': 999, 'dictionary_id__999': 2}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
Right, include the --clean
flag via: mfa align -t ./temp -j 2 ./temp_mfa english_us_arpa english_us_arpa ./ljs_alignedcapstone --beam 100 --clean
, since it's still expecting the same dictionary with that dataset.
Getting this now:
$ mfa align -t ./temp -j 2 ./temp_mfa english_us_arpa english_us_arpa ./ljs_alignedcapstone --beam 100 --clean
ERROR There was an error connecting to the global MFA database server.
ERROR Please ensure the server is initialized (mfa server init) or running (mfa server start)
ERROR There was an error in the run, please see the log.
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x000001AACCA46410>>
Traceback (most recent call last):
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\command_line\mfa.py", line 97, in history_save_handler
raise self.exception
File "C:\Users\Ketaki\anaconda3\envs\aligner\Scripts\mfa-script.py", line 10, in <module>
sys.exit(mfa_cli())
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\rich_click\rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\click\decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\command_line\align.py", line 113, in align_corpus_cli
aligner.align()
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\alignment\pretrained.py", line 405, in align
self.initialize_database()
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\site-packages\montreal_forced_aligner\abc.py", line 241, in initialize_database
subprocess.check_call(
File "C:\Users\Ketaki\anaconda3\envs\aligner\lib\subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['createdb', '--host=C:/Users/Ketaki/Documents/MFA/pg_mfa_global_socket', 'temp_mfa']' returned non-zero exit status 1.
hmm, is this on the same machine as mfa configure --enable_auto_server
? Is there anything in the pg logs of the temp directory?
Ran the auto_server and it's still the same error. this is my align.1 log:
'C:\Users\Ketaki\anaconda3\envs\aligner\Library\bin\gmm-boost-silence.EXE' --boost=1.0 1 'temp\temp_mfa\alignment\final.alimdl' -
'C:\Users\Ketaki\anaconda3\envs\aligner\Library\bin\gmm-align-compiled.EXE' --transition-scale=1.0 --acoustic-scale=0.083333 --self-loop-scale=0.1 --beam=100 --retry-beam=40 --careful=false '--write-per-frame-acoustic-loglikes=ark:temp\temp_mfa\alignment\like.1.1.ark' - 'ark,s,cs:temp\temp_mfa\alignment\fsts.1.1.ark' 'ark,s,cs:splice-feats --left-context=3 --right-context=3 scp,s,cs:"temp\temp_mfa\temp_mfa\split2\feats.1.1.scp" ark:- | transform-feats "temp\temp_mfa\alignment\lda.mat" ark:- ark:- |' 'ark:temp\temp_mfa\alignment\ali.1.1.ark' ark,t:-
WARNING (gmm-boost-silence.EXE[5.5.1016]:main():gmmbin\gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence.EXE[5.5.1016]:main():gmmbin\gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence.EXE[5.5.1016]:main():gmmbin\gmm-boost-silence.cc:103) Wrote model to -
splice-feats --left-context=3 --right-context=3 'scp,s,cs:temp\temp_mfa\temp_mfa\split2\feats.1.1.scp' ark:-
transform-feats 'temp\temp_mfa\alignment\lda.mat' ark:- ark:-
LOG (transform-feats[5.5.1016]:main():featbin\transform-feats.cc:158) Overall average [pseudo-]logdet is -89.6349 over 25547 frames.
LOG (transform-feats[5.5.1016]:main():featbin\transform-feats.cc:161) Applied transform to 1 utterances; 0 had errors.
LOG (gmm-align-compiled.EXE[5.5.1016]:main():gmmbin\gmm-align-compiled.cc:127) 1-1
ERROR (gmm-align-compiled.EXE[5.5.1016]:kaldi::AlignUtteranceWrapper():decoder\decoder-wrappers.cc:594) Beams do not make sense: beam 100, retry-beam 40
kaldi::KaldiFatalError
This is my pg-log-global: https://gist.github.com/hailthedawn/886ffcdee7cd8593c7c48dcb48e2ac7f
pg_init_log_global reports no errors.
@mmcauliffe Hey! Any idea what might be happening here?
I have figured out that when I only align a section of the audio, and the section doesn't contain any disfluencies (eg - "mmhmm"), it does not crash, and runs properly. As soon as I use a section of the audio that has a disfluency, it crashes. (Even if my transcript doesn't contain the disfluency). Still not sure how to make it work for the full audio. (I don't want to take out all disfluencies). I can try adding all disfluencies in my data to the ARPA dictionary, but not sure if that would work if the phones aren't present in ARPA.
Note: Audio also contains "um" and "uh" and I haven't verified if it crashes for those yet. It crashes for "mmhmm" currently
Right, so the alignment algorithm currently assumes that the transcripts have all of the words it's looking for, including disfluencies and filled pauses. I have some code that I'm playing around with for designating some words/multiword sequences like "you know" or "I mean" in English as interjections that might not be transcribed (because the Japanese corpus I'm working on often doesn't include them), but there's still some implementation issues I need to figure out.
I didn't realize initially that it had multiple speakers in the file, which may also cause issues with speaker adaptation later, so I would recommend breaking up the file and using the TextGrid input format here: https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/corpus_structure.html#textgrid-format so that you can assign intervals to each speaker. I have an early, early alpha build of the program that I use to create and fix corpora here: https://anchor-annotator.readthedocs.io/en/latest/, and you can quickly split up utterance in that.
Additionally, if you have more data for each speaker than just this file, it'll likely be better alignments overall.
Hi, thank you! I stuck to doing text-based alignment, and was able to split the file per speaker turn, improve my transcriptions, and now have only ~20 alignments failing out of 1030. (By failing, I mean the TextGrid doesn't get generated).
However, for quite a few of the utterances that are short (1 word long), silence isn't detected properly near the end of the audio file. In some of them, a single laugh is reported as taking up a full 20 seconds (which was the length of the segmented file I passed in). When I play the audio manually, the laugh takes up all of a half second. I tried fiddling with the boost_silence parameter, even going up as high as 70, but did not see much improvement (maybe 0.5 seconds or so). Do you recommend I manually go through all of these, or try to use longer utterances?
@hailthedawn
Hi there,
How are you able to get the notebook to work? It fails on the last step for me. See below:
The global MFA database server does not exist, initializing it first. pg_ctl stdout: pg_ctl stderr: initdb: error: cannot be run as root initdb: hint: Please log in (using, e.g., "su") as the (unprivileged) user that will own the server process.
Traceback (most recent call last):
File "/tmp/mfa/miniconda3/envs/aligner/bin/mfa", line 10, in
There was an error encountered starting the global MFA database server, please see /root/Documents/MFA/pg_init_log_global.txt for more details and/or look at the logged errors above. See output files at ./ljs_aligned
i got the same problem(There was an error encountered starting the global MFA database server),how can i fix it?
initdb: hint: Please log in (using, e.g., "su") as the (unprivileged) user that will own the server process.
having the same issue! There has to be a way to run it without sudo, right?
@299792459b It looks like you're running on docker, in which case I would recommend using https://hub.docker.com/repository/docker/mmcauliffe/montreal-forced-aligner/general or looking at https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html#installing-mfa-in-your-own-containers. MFA should not be running as root.
Not sure if others hitting this are in the same situation, but I am going to close this as the original issue should be solved or at least better with the latest MFA model, but the root cause is really that MFA relies on accurate speaker labels heavily. For those encountering error as root users and the links above don't help, feel free to make other issues.
I tried to run the sample Colab notebook (https://gist.github.com/NTT123/12264d15afad861cb897f7a20a01762e) locally (as a .py file). It's working fine when I use it for the provided ljspeech data. However, when I try it with my own data (a 16khz wav file and its .txt transcript), I get the following error:
(The audio file is around 4 minutes long, if that's relevant.)