MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.26k stars 242 forks source link

[BUG]error #726

Closed shiyanpei0826 closed 4 months ago

shiyanpei0826 commented 7 months ago

Debugging checklist

[ ] Have you updated to latest MFA version? yes [ ] Have you tried rerunning the command with the --clean flag?

Describe the issue A clear and concise description of what the bug is. Error when using multiprocessing to train a alignment model using my own dataset

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? english
    • How many files/speakers? 2457
    • Are you using lab files or TextGrid files for input? yes
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? yes
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one?
    • If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

Desktop (please complete the following information):

Additional context Add any other context about the problem here. File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2825, in scalar ret = self.one() ^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one return self._iter().one() # type: ignore ^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2847, in _iter result: Union[ScalarResult[_T], Result[_T]] = self.session.execute( ^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2308, in execute return self._execute_internal( ^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2180, in _execute_internal conn = self._connection_for_bind(bind) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2047, in _connection_for_bind return trans._connection_for_bind(engine, execution_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "", line 2, in _connection_for_bind

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go ret_value = fn(self, *arg, **kw) ^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1143, in _connection_for_bind conn = bind.connect() ^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3268, in connect return self._connection_cls(self) ^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 145, in init self._dbapi_connection = engine.raw_connection() ^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3292, in raw_connection return self.pool.connect() ^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 452, in connect return _ConnectionFairy._checkout(self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 1269, in _checkout fairy = _ConnectionRecord.checkout(pool) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 716, in checkout rec = pool._do_get() ^^^^^^^^^^^^^^

File "/opt/conda/envs/aligner/lib/python3.11/site-packages/sqlalchemy/pool/impl.py", line 158, in _do_get raise exc.TimeoutError(

sqlalchemy.exc.TimeoutError: QueuePool limit of size 10 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/20/3o7r)

shiyanpei0826 commented 7 months ago

Error when using multiprocessing to train an acoustic model

yzmyyff commented 5 months ago

Hi, Is there any progress on this issue?

shiyanpei0826 commented 5 months ago

Hi

I just fixed the --num_jobs to 10, that will work

Thanks

MiniXC commented 4 months ago

Does anyone know why this happens? Is it not advisable to set num_jobs = num_cpu?

mmcauliffe commented 4 months ago

I don't believe this should be an issue anymore in MFA 3.0.0 unless --use_threading is set, as the default is back to using separate processes with their own engines, but feel free to reopen if you're still hitting it.

shreeshailgan commented 3 months ago

I am using montreal-forced-aligner==3.0.0. I am trying to run mfa align. I am also facing this issue.

sqlalchemy.exc.TimeoutError: QueuePool limit of size 10 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: 
https://sqlalche.me/e/20/3o7r)

I am using --num_jobs 32. My server is a 48-core Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz. I was earlier working on an AMD EPYC 7542 32-Core Processor machine (also with --num_jobs 32) and I did not encounter this error on it.

mmcauliffe commented 3 months ago

Can you run conda update -c conda-forge montreal-forced-aligner --update-deps? This bug was fixed in 3.0.1: https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html#id2

shreeshailgan commented 3 months ago

@mmcauliffe Updating to 3.0.1 solved the issue. Thanks. However, I used conda install montreal-forced-aligner==3.0.1 since running the above provided command changed the MFA version from 3.0.0 to 2.0.0b8.