Open AdolfVonKleist opened 7 years ago
@AdolfVonKleist thanks a lot for such wonderful library ๐ I am having following questions, please share your thoughts.
[Does usage same as defined in Ref 1 HMM where emission probabilities comes from alignment module and transition prob comes from LM trained on phonetic sequences passed in training as word pair <Grapheme /t Phoneme> ? but looks like you are trying to improve alignment module itself using KN smoothing]
Note : Got the overview of this work from these references :
Thanks a lot !!
You should be able to use kenlm to perform the ARPA training directly. Just use the command line utilities instead of the python wrappers.
$ phonetisaurus-align --input=cmudict.formatted.dict \
--ofile=cmudict.formatted.corpus --seq1_del=false
# Train an n-gram model (5s-10s):
$ estimate-ngram -o 8 -t cmudict.formatted.corpus \
-wl cmudict.o8.arpa
# Convert to OpenFst format (10s-20s):
$ phonetisaurus-arpa2wfst --lm=cmudict.o8.arpa --ofile=cmudict.o8.fst
just replace the estimate-ngram
call with an equivalent kenlm command. You'll need to output it to ARPA text format though, so that you can still transform it into a WFST for inference.
The mitlm call is trained on the output of the alignment - it just treats the aligned and segmented joint token sequences a 'normal' text corpus.
@AdolfVonKleist Thanks for quick response ๐
Does RNNLM work in current master code. Is this used in same way as mit-lm ? or Is there any goodness achieved with RNNLM over mit-lm ?
And just to clarify, this is what you are saying about last line. [Steps from paper referenced above]
Thanks :)
Hi,
Yes it should work, however the rnnlm code has not been updated since that earliest release, and is effectively the same as the original Mikolov code from that time. The only novel contribution there is the joint token implementation of the decoder.
I did not find it to yield any significant improvement over mitlm as a pure alternative, and the training time, as well as decoding time were significantly slower. The only place where it yielded a modest boost was when used in ensemble with mitlm as described in the paper [but again there is a time penalty]. Whether or not that was/is sufficient reason to use the combined system in a real-world or production setting, as opposed to just the normal joint ngram models, would probably depend on how heavily you prioritize speed versus absolute accuracy.
Best, Joe
2020ๅนด4ๆ6ๆฅ(ๆ) 0:43 smilenrhyme notifications@github.com:
@AdolfVonKleist https://github.com/AdolfVonKleist Thanks for quick response ๐
Does RNNLM work in current master code. Is this used in same way as mit-lm ? or Is there any goodness achieved with RNNLM over mit-lm ?
And just to clarify, this is what you are saying about last line. [Steps from paper referenced above]
[image: image] https://user-images.githubusercontent.com/45142420/78534235-b5e4db00-7807-11ea-9463-60374d6df83e.png
Thanks :)
โ You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdolfVonKleist/Phonetisaurus/issues/24#issuecomment-609629902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABVUA5U5WV2D6P2Y7WTESLRLGBZVANCNFSM4DW3MMRA .
Thanks a lot for detailed perspective ๐
The topic of LM training came up again recently.
The aligner produces weighted alignment lattices. There is some evidence that augmenting the Maximization step in the EM alignment process with the sort of expected-count KN smoothing described in this paper may/should improve the overall quality of the G2P aligner:
The same approach may be used to directly train the target joint n-gram model from the resulting alignment lattices. I previously tried the latter using the WB fractional counts implementation in
OpenGrm NgramLibrary
, but it seemed to have little impact. The Zhang paper notes a similar outcome and that EC-KN appears to be much more performant, even compared to the fractional KN implementation employed in Sequitur.If I'm going to include some form of LM training after all, maybe this represents the most appropriate choice. There is also reference implementation as a Ghiza add-on: