AdolfVonKleist / Phonetisaurus

Phonetisaurus G2P
BSD 3-Clause "New" or "Revised" License
449 stars 122 forks source link

Print warning when the alignment lattice wont get built #22

Closed jtrmal closed 7 years ago

jtrmal commented 7 years ago

For syllabic languages (and perhaps other languages with a more complex pronunciation rules), the default setting seq1_max=2 and seq2_max=2 is sometimes not sufficient -- the alignment lattice, in that case, won't get even constructed. However, phonetisaurus is completely silent about this. I think it would be good to issue a warning so that user would know something is going on.

AdolfVonKleist commented 7 years ago

I've merged a new commit into master that includes this fix. The phonetisaurus-align tool now prints a warning message to stderr, Alignment failed: W A G A, for each word-pron pair that cannot be aligned due to the default/user-defined multigram size constraints. The phonetisaurus_train script also prints this, but only when called with the --verbose flag. I also added a boolean option to the aligner and script, --grow, set to false by default, which when set to true will cause the aligner to iteratively increase the max alignment arguments until each pair can be aligned.

A test directory is added which contains a small regression for the g014b2b set. There is no noticeable difference at 1-best whether the --grow flag is used or not, but it may be useful in some cases with enough training data, and/or use of n-best.

I haven't updated the kaldi tag yet, so this will not be pulled in automatically yet. But I will update that if this resolves your issue.

The easiest way to see the change is probably to just pull, compile, and run:

$ ./src/bin/phonetisaurus-align \
    --input=test/g014b2b/g014b2b.train \
    --ofile=train.corpus --seq1_del=false
GitRevision: kaldi-20-g7b019a-dirty
Loading input file: test/g014b2b/g014b2b.train
Alignment failed: A A A
Alignment failed: A O L
AdolfVonKleist commented 7 years ago

I'm assuming this is OK now @jtrmal , and resolved in the referred commit. The new build updates in the latest PR from @giuliopaci also should resolve remaining compilation issues related to rpath.

jtrmal commented 7 years ago

cool, thanks for the change. Apologies for not being able to respond faster -- we were close to evaluation deadline. I'll play with it probably early next week. y.

On Thu, Aug 31, 2017 at 5:23 AM, Josef Novak notifications@github.com wrote:

Closed #22 https://github.com/AdolfVonKleist/Phonetisaurus/issues/22.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdolfVonKleist/Phonetisaurus/issues/22#event-1229171972, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX5awFPgLHlRhxCu_HMBN2_YpMAcuks5sdnt0gaJpZM4Ow-Cp .