AdolfVonKleist / Phonetisaurus

Phonetisaurus G2P
BSD 3-Clause "New" or "Revised" License
448 stars 122 forks source link

Alignment Failed error with Given Example #36

Open prashantserai opened 6 years ago

prashantserai commented 6 years ago

Hi!

After some efforts (tweaking Makefiles for Phonetisaurus and MITLM, amongst others), I managed to install Phonetisaurus, but it gives me the following error when trying to run the example from the README.

$ phonetisaurus-train --lexicon cmudict.formatted.dict --seq2del INFO:phonetisaurus-train:2018-05-30 17:28:00: Checking command configuration... INFO:phonetisaurus-train:2018-05-30 17:28:00: Checking lexicon for reserved characters: '}', '|', ''... INFO:phonetisaurus-train:2018-05-30 17:28:00: Aligning lexicon... ERROR:phonetisaurus-train:2018-05-30 17:28:00: Alignment failed. Exiting.

Any ideas what I could do?

AdolfVonKleist commented 6 years ago

What platform was this, and what version of OpenFst? Could you also post the first few lines of the input lexicon.

Also the python script you ran is just a wrapper around the c++ binaries. What happens if you just run the aligner:

phonetisaurus-align --input=cmudict.formatted.dict \
  --ofile=cmudict.formatted.corpus --seq1_del=false

I just downloaded everything and recompiled it from scratch on my MacBook (OSX 10.12.6, OpenFst 1.6.2) and it seemed to go OK.

prashantserai commented 6 years ago

The OpenFst version is 1.6.3 and the OS is RHEL Server release 6.9 (Santiago).

The command you tried does work for me too. Took 12 iterations or so.

What fails is phonetisaurus-train (with seq2_del or seq1_del) phonetisaurus-train --lexicon cmudict.formatted.dict --seq2_del

The first few lines of my cmudict.formatted.dict are:

'bout B AW1 T 'cause K AH0 Z 'course K AO1 R S 'cuse K Y UW1 Z 'em AH0 M 'frisco F R IH1 S K OW0 'gain G EH1 N 'kay K EY1 'm AH0 M 'n AH0 N 'round R AW1 N D 's EH1 S 'til T IH1 L 'tis T IH1 Z 'twas T W AH1 Z a AH0 a EY1 a's EY1 Z a. EY1 a.'s EY1 Z a.d. EY2 D IY1 a.m. EY2 EH1 M a.s EY1 Z aaa T R IH2 P AH0 L EY1 aaberg AA1 B ER0 G

prashantserai commented 6 years ago

@AdolfVonKleist you sure meant to close this?

wael34218 commented 6 years ago

@prashantserai I had the same problem when installing OpenFST 1.6.8. After I uninstalled everything and reinstalled using OpenFST 1.6.2 and it worked fine.

AdolfVonKleist commented 6 years ago

@prashantserai sorry I did not get the notification with your setup details. @wael34218 were you also running RHEL 6.9? The only OpenFst and OS combinations currently running on TravisCI are those described in the config file:

I'll try to find some time this week to upgrade OpenFst to 1.6.8, but that will still only cover the existing OSX and Ubuntu 14.04 platform builds. @prashantserai if you can contribute a RHEL configuration addon for the TravisCI yaml that would certainly be welcomed.

wael34218 commented 6 years ago

@AdolfVonKleist I am using Ubuntu 16.04

prashantserai commented 6 years ago

FYI I did try changing from OpenFST 1.6.3 to 1.6.2 too, but the problem from the original post persisted for me.

I did a verbose log:

[serai@zirconium example]$ phonetisaurus-train --lexicon cmudict.formatted.dict --seq2del --verbose INFO:phonetisaurus-train:2018-09-07 23:12:56: Checking command configuration... INFO:phonetisaurus-train:2018-09-07 23:12:56: Checking lexicon for reserved characters: '}', '|', ''... DEBUG:phonetisaurus-train:2018-09-07 23:12:57: arpa_path: train/model.o8.arpa DEBUG:phonetisaurus-train:2018-09-07 23:12:57: corpus_path: train/model.corpus DEBUG:phonetisaurus-train:2018-09-07 23:12:57: dir_prefix: train DEBUG:phonetisaurus-train:2018-09-07 23:12:57: grow: False DEBUG:phonetisaurus-train:2018-09-07 23:12:57: lexicon_file: cmudict.formatted.dict DEBUG:phonetisaurus-train:2018-09-07 23:12:57: logger: <logging.Logger instance at 0xefa908> DEBUG:phonetisaurus-train:2018-09-07 23:12:57: makeJointNgramCommand: <bound method G2PModelTrainer._mitlm of <main.G2PModelTrainer instance at 0xefaa70>> DEBUG:phonetisaurus-train:2018-09-07 23:12:57: model_path: train/model.fst DEBUG:phonetisaurus-train:2018-09-07 23:12:57: model_prefix: model DEBUG:phonetisaurus-train:2018-09-07 23:12:57: ngram_order: 8 DEBUG:phonetisaurus-train:2018-09-07 23:12:57: seq1_del: False DEBUG:phonetisaurus-train:2018-09-07 23:12:57: seq1_max: 2 DEBUG:phonetisaurus-train:2018-09-07 23:12:57: seq2_del: True DEBUG:phonetisaurus-train:2018-09-07 23:12:57: seq2_max: 2 DEBUG:phonetisaurus-train:2018-09-07 23:12:57: verbose: True DEBUG:phonetisaurus-train:2018-09-07 23:12:57: phonetisaurus-align --input=cmudict.formatted.dict --ofile=train/model.corpus --seq1_del=false --seq2_del=true --seq1_max=2 --seq2_max=2 --grow=false INFO:phonetisaurus-train:2018-09-07 23:12:57: Aligning lexicon... FATAL: SetFlags: Bad option: --grow=false ERROR:phonetisaurus-train:2018-09-07 23:12:57: Alignment failed. Exiting.

prashantserai commented 6 years ago

I was able to make the recipe in README.md work finally with the following changes after installation (needed a couple of separate hacks for installation i.e. config and build):

phonetisaurus-train, commenting out lines 191-194 as below to conquer runtime error

command = [
            "phonetisaurus-align",
            "--input={0}".format (self.lexicon_file),
            "--ofile={0}".format (self.corpus_path),
            "--seq1_del={0}".format (str (self.seq1_del).lower ()),
            #"--seq2_del={0}".format (str (self.seq2_del).lower ()), #line191
            #"--seq1_max={0}".format (str (self.seq1_max)), #line 192
            #"--seq2_max={0}".format (str (self.seq2_max)), #line193
            #"--grow={0}".format (str (self.grow).lower ()) #line194
        ]

phonetisaurus-apply, line 320 change to conquer syntax error Original Code (spacing characters could be off viz space,\t,\n etc.)

tester = G2PModelTester (
args.model, 
**{key:val for key,val in args.__dict__.iteritems ()
if not key in ["model","word_list"] }
)

Modified to:

    tempdict={}
    for key,val in args.__dict__.iteritems():
        if not key in ["model", "word_list"]:
            tempdict[key]=val
    tester = G2PModelTester (args.model,**tempdict)

Don't know why these issues are exclusively on my system cos this is Python code which I thought should've been largely Platform independent. Anyway, hope this helps.

PS: The hacks used to build were:

  1. Added "-lrt" to the LIBS in Makefile for Phonetisaurus
  2. Comment out couple lines in the configure.ac file for MITLM as per this suggestion
AdolfVonKleist commented 6 years ago

Interesting. Thanks for the update. What python version/environment are you running on, also can you confirm the OS/build? I'm surprised that comprehension does not work. I would definitely like to sort out all the python issues; as you say that code should really be platform independent, but I currently do not know how to replicate these issues on my side.

prashantserai commented 6 years ago

Hi! this is the info below, I guess it's an old python version

[serai@zirconium ~]$ python Python 2.6.6 (r266:84292, Aug 9 2016, 06:11:56) [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

solonj commented 5 years ago

To offer another data point, I also had the same error. For me simply commenting out the 4 lines in the phonetisaurus-train "makeAlignerCommand" function, per @prashantserai above, was all I had to do to move forward. Right now I am on Amazon Linux 2, OpenFST 1.7.2. Python 2.7.14 (default, Jul 26 2018, 19:59:38) [GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2