Open aarppe opened 9 months ago
Note: Special morpheme boundary characters may need to also be removed from the Normative Generator FST.
After discussion with @aarppe, it was established that the expected behaviour for generator FSTs should be to include special morpheme boundary characters, and it is the job of the app to discard them when irrelevant. As shown in the instructions in this thread, it is ok for analyser FSTs to drop them.
The following are explicit instructions on creating a descriptive analyzer and normative generator (with morpheme boundaries) from updated LEXC source (undertaken in #108):
If one has compiled the aggregate LEXC file,
lexicon.lexc
(used to belexicon.tmp.lexc
), with the regular GiellaLT compilation scheme, one can use that file as the primary source.Otherwise, one can compile the aggregate file as follows:
cat src/fst/root.lexc src/fst/stems/noun_stems.lexc src/fst/morphology/stems/verb_stems.lexc src/fst/morphology/stems/particles.lexc src/fst/morphology/stems/pronouns.lexc src/fst/morphology/stems/numerals.lexc src/fst/morphology/affixes/noun_affixes.lexc src/fst/morphology/affixes/verb_affixes.lexc > lexicon.lexc
Normally, the necessary FSTs would be created according to the standard GiellaLT compilation configruration, with the option
--enable-dicts
.