Open GoogleCodeExporter opened 9 years ago
[deleted comment]
[deleted comment]
Here's a simpler, although ultimately equivalent way of doing it.
First we define a transducer that optionally uppercases the first letter:
define UpCase
[a:A|b:B|c:C|d:D|e:E|f:F|g:G|h:H|i:I|j:J|l:L|m:M|n:N|o:O|p:P|q:Q|r:R|s:S|t:T|u:U
|v:V|w:W|x:X|y:Y|z:Z] ?* | ?*;
And then we compose this in last after Cleanup:
define Grammar Lexicon .o.
ConsonantDoubling .o.
EDeletion .o.
EInsertion .o.
YReplacement .o.
KInsertion .o.
Cleanup .o.
UpCase ;
As for the second question: there is no way to build a transducer that would
map cat+N+Sg only to cat and where its inverse would also map Cat to cat+N+Sg.
They are the same device, so to speak, and contain the same mappings regardless
of the direction. The only way to get cat as the only output for cat+N+Sg is to
define two separate transducers, one for generation (without the uppercasing
composed in), and another one for parsing (with uppercasing). It is in fact
fairly normal to maintain two such transducers for various reasons.
Original comment by mans.hul...@gmail.com
on 29 Jul 2012 at 3:38
Thanks for the answer and help.
The UpCase transducer is a big deal for me, because I have 8200 lines in the
foma code now, and it does matter for
* code maintenance
* adding new code
* compiling
if I have 8200 lines or 16400 lines.
For the second also thanks for the idea with separating generation and parsing.
At present I am not that far yet, that I can decide, if I do that.
Original comment by eleonor...@gmx.net
on 30 Jul 2012 at 8:21
Original issue reported on code.google.com by
eleonor...@gmx.net
on 29 Jul 2012 at 12:28Attachments: