Closed albbas closed 8 years ago
Date: 2011-09-15 12:28:19 +0200
From: Trond Trosterud <
To repete: test these words. The err-marked ones shall be out. In ms word they are, but in hfst ooo they are all ok. Behind the schene is the CmpN/SgN, CmpN/SgG, CmpN/PlG tgs added to nuorra but not to nargu. hfst does not stop the errouneous compounds.
nargu nargoniibi err: narggoniibi, err: nargguniibi nuorraniibi nuoraniibi nuoraidniibi err: nuoratniibi
Date: 2011-09-15 12:29:17 +0200
From: Trond Trosterud <
This is the new hfst speller, does it need a new component?
Date: 2011-09-15 13:32:05 +0200
From: Sjur Nørstebø Moshagen <
New component added, and priority etc changed.
This bug is somewhat related to the fact that xfst filters do not yet work when compiling normative HFST transducers, partly due to compounding tags not being used in this way till now - this is new territory.
Date: 2011-09-28 17:04:28 +0200
From: Sjur Nørstebø Moshagen <
This is a problematic bug, and won't be resolved in the short term. It boils down to the following:
Until now we have used these tags only to compute the correct form used for the PLX lexicon conversion. For that tags are perfect.
With HFST they should instead be used to control compounding directly. It is possible to write a regex filter that will do that, but such a filter will force all legal combinations to be spelled out — that is the only way for the transducer to remove the illegal combinations. Since compounding is a recursive/circular phenomenon, and in principle open-ended, this creates an infinite loop which will cause the transducer to explode in size. Even if we add an arbitrary limit to the circularity (say, stop after 5 iterations), the size will still explode way beyond what we can handle.
So — the only reasonable way to handle this is by using flag diacritics. But these tags are not flags, and can't be turned into flags either (that would break the PLX conversion).
What is needed is a flag diacritic system parallel to the existing tag system, implementing the same semantics that way. It will NOT be pretty, and we need to devote some time to it to get it right. But it is the only practical way to solve this that I can see.
Date: 2011-09-28 18:03:29 +0200
From: Trond Trosterud <
For the future?
Date: 2014-10-21 08:36:25 +0200
From: Trond Trosterud <
New priority.
Date: 2016-03-30 12:55:00 +0200
From: Thomas Omma <
wiho how well this works in newest speller
This issue was created automatically with bugzilla2github
Bugzilla Bug 1142
Date: 2011-09-15T12:28:19+02:00 From: Trond Trosterud <>
To: Thomas Omma <>
CC: sjur.n.moshagen, thomas.omma, tomi.k.pieski, trond.trosterud
Last updated: 2016-03-30T12:55:00+02:00