MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.29k stars 242 forks source link

[BUG] #647

Closed diegotg2000 closed 1 year ago

diegotg2000 commented 1 year ago

Debugging checklist

[x] Have you updated to latest MFA version? [x] Have you tried rerunning the command with the --clean flag?

Describe the issue When validating the data I get a syntax error of a SQL query sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) near "~": syntax error [SQL: UPDATE utterance SET in_subset=? WHERE utterance.id IN (SELECT anon_1.id FROM (SELECT utterance.id AS id FROM utterance WHERE (utterance.text ~ ?) AND utterance.ignored = 0 ORDER BY utterance.duration LIMIT ? OFFSET ?) AS anon_1 ORDER BY random() LIMIT ? OFFSET ?) RETURNING id] [parameters: (1, '\s\S+\s', 20000, 0, 2000, 0)]

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Nigerian pidgin
    • How many files/speakers? 75 speakers, 99 utterances per speaker
    • Are you using lab files or TextGrid files for input? lab files
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? No
    • If it's a custom dictionary, what is the phoneset? I use a dummy dictionary in which I map the filename of the utterance with the phonetic transcription. The phonemes are encoded using the SAMPA alphabet.
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one?
    • If it's a model you've trained, what data was it trained on?

Log file pg_log_global.txt

Desktop (please complete the following information):

Additional context My goal is to train and then align the corpus. The error comes from a query executed at line 1165 of the montreal_forced_aligner/corpus/base.py script.

mfaytak commented 1 year ago

I am getting the same error message off a fresh conda install on Mac OS (Ventura 13.4).

Debugging checklist

[x] Have you updated to latest MFA version? [x] Have you tried rerunning the command with the --clean flag?

Issue running mfa validate [my corpus] [my dictionary] fails with an SQL Alchemy (?) query as follows:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) near "~": syntax error
[SQL: UPDATE utterance SET in_subset=? WHERE utterance.id IN (SELECT anon_1.id 
FROM (SELECT utterance.id AS id 
FROM utterance 
WHERE (utterance.text ~ ?) AND utterance.ignored = 0 ORDER BY utterance.duration
 LIMIT ? OFFSET ?) AS anon_1 ORDER BY random()
 LIMIT ? OFFSET ?) RETURNING id]
[parameters: (1, '\\s\\S+\\s', 20000, 0, 2000, 0)]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Full command line output and traceback leading to this: full-traceback.txt

The failure seems to come when the mfa validate routine is meant to create the subset directory of 2000 utterances.

For Reproducing your issue Please fill out the following:

Corpus structure

Dictionary

Acoustic model

Log file

pg_log_global.txt (contains a few runs including the --clean run)

Desktop (please complete the following information):

mfaytak commented 1 year ago

Thanks! I am a bit clueless on how this bug fix will be distributed, and need to use the aligner right away - will a conda update get this onto my device?

mmcauliffe commented 1 year ago

It should trickle through the conda pipeline this evening, probably like 1-2 hours, and then you can run conda update montreal-forced-aligner to pick it up.

mmcauliffe commented 1 year ago

You can also rerun with the mfa validate ... --use_postgres --auto_server and that will run the aligner using the PostgreSQL backend instead of sqlite that won't be affected by this bug.

hanspaa2017108 commented 2 weeks ago

You can also rerun with the mfa validate ... --use_postgres --auto_server and that will run the aligner using the PostgreSQL backend instead of sqlite that won't be affected by this bug.

Can this be used with Mfa align command as well ?? @mmcauliffe