Open hectoralos opened 4 years ago
I can think of two ways to do this, and both are hard:
chv-morph
and chv-segment
and match things programmatically. This will be difficult in one-to-many mappings of tags to morphemes, though (e.g., <p3><sg>
). Perhaps it could be done with a list of morphemes that have multiple tags (and tags that have multiple morphemes?).lexc
:
ӗ<px3sp>:%{ӗ%} PLURAL ;
But we'd need to have the phonology make the ӗ
on the left into и
, given your example, which would require an extra twol
transducer intersected with the analysis side of the transducer, and it would have to do weird things like look through the <n>
tag.
I note that you don't enitrely apply phonology in the example, though (you have ача
, not ач
), so the questions become:
I copy-paste Artem Fedorinqyk's answer (Artem is the person who asked for this enhancement):
Maybe
^ачисен/ача<n>и<px3sp>сен<pl><gen>$
will be easier? Let it be no one-to-many mapping, but at least at the formal level we certainly can get radical "ача" and affixes "и" and "сен" and give each of them some meaning.
Here is another example. If possible, Artem would like something similar to this output:
^юратӑвӗ/юрату<n>ӗ<px3sp>$
A new Chuvash grammar textbook is being prepared on the basis of a 3M+ word corpus and our morphological analysis. The author is asking for a composite output of modes chv-morph and chv-segment in which he could more easily search for specific surface forms of morphems.
For instance currently we have these two analysis for ачисен:
He is asking for something like this:
^ачисен/ача<n>и<px3sp>се<pl>н<gen>$
This request seems not illogical and probably can be useful for other people and languages.
Could this more or less easily be done?