Closed kylebgorman closed 3 years ago
To clarify, [ and ] are special characters in Thrax and Pynini strings. If you want them to be interpreted literally as
[
and]
, you have to put a backslash escape before them (both in rules and strings in general).pynini.escape
automatically does this for us.FYI, don't test your g2p grammar on the Aeneid text: test it on the output of the normalization grammar, and I think that will dismiss the problem. At a later date we can combine the different grammars into one.
So would I use the rewriter tool to produce the output of the normalization grammar, paste that into a separate txt file, then test the g2p grammar with the rewriter tool on that txt file?
I set up the rewriter so that it works well with UNIX-style pipes, so you don't even have to create those intermediate files. This might look something like (not tested):
cat Aeneid01.txt | ./rewriter.py --far normalize.far --rules NORMALIZE | ./rewriter.py --far pronounce.far --rules PRONOUNCE
This is just a temporary hack though: we can either put all the grammar rules into a single FAR later (just by importing the rules we want and then re-exporting them) to be used as part of a cascade (./rewriter.py --far everything.far --rules NORMALIZE PRONOUNCE ...
) or we can combine them into a single rule with composition (./rewriter.py --far everything.far --rules EVERYTHING ...
) later down the road. But Thrax gives us the modularity to make these decisions later. I'm trying to resist the urge to over-design...
Thinking ahead a bit we may need slightly different "flavors" of the various rules for different data sources. For instance while Pharr uses j and v, maybe we want to make a webapp that can handle text where even glides are written with i and u. Or maybe we want to support text without macrons someday.
To clarify, [ and ] are special characters in Thrax and Pynini strings. If you want them to be interpreted literally as
[
and]
, you have to put a backslash escape before them (both in rules and strings in general).pynini.escape
automatically does this for us.FYI, don't test your g2p grammar on the Aeneid text: test it on the output of the normalization grammar, and I think that will dismiss the problem. At a later date we can combine the different grammars into one.