giellalt / bugzilla-dummy

0 stars 0 forks source link

New hfst speller does not adhere to sme compound tags (Bugzilla Bug 1142) #1685

Closed albbas closed 8 years ago

albbas commented 13 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 1142

Date: 2011-09-15T12:28:19+02:00 From: Trond Trosterud <> To: Thomas Omma <> CC: sjur.n.moshagen, thomas.omma, tomi.k.pieski, trond.trosterud

Last updated: 2016-03-30T12:55:00+02:00

albbas commented 13 years ago

Comment 5091

Date: 2011-09-15 12:28:19 +0200 From: Trond Trosterud <>

To repete: test these words. The err-marked ones shall be out. In ms word they are, but in hfst ooo they are all ok. Behind the schene is the CmpN/SgN, CmpN/SgG, CmpN/PlG tgs added to nuorra but not to nargu. hfst does not stop the errouneous compounds.

nargu nargoniibi err: narggoniibi, err: nargguniibi nuorraniibi nuoraniibi nuoraidniibi err: nuoratniibi

albbas commented 13 years ago

Comment 5092

Date: 2011-09-15 12:29:17 +0200 From: Trond Trosterud <>

This is the new hfst speller, does it need a new component?

albbas commented 13 years ago

Comment 5093

Date: 2011-09-15 13:32:05 +0200 From: Sjur Nørstebø Moshagen <>

New component added, and priority etc changed.

This bug is somewhat related to the fact that xfst filters do not yet work when compiling normative HFST transducers, partly due to compounding tags not being used in this way till now - this is new territory.

albbas commented 13 years ago

Comment 5172

Date: 2011-09-28 17:04:28 +0200 From: Sjur Nørstebø Moshagen <>

This is a problematic bug, and won't be resolved in the short term. It boils down to the following:

Until now we have used these tags only to compute the correct form used for the PLX lexicon conversion. For that tags are perfect.

With HFST they should instead be used to control compounding directly. It is possible to write a regex filter that will do that, but such a filter will force all legal combinations to be spelled out — that is the only way for the transducer to remove the illegal combinations. Since compounding is a recursive/circular phenomenon, and in principle open-ended, this creates an infinite loop which will cause the transducer to explode in size. Even if we add an arbitrary limit to the circularity (say, stop after 5 iterations), the size will still explode way beyond what we can handle.

So — the only reasonable way to handle this is by using flag diacritics. But these tags are not flags, and can't be turned into flags either (that would break the PLX conversion).

What is needed is a flag diacritic system parallel to the existing tag system, implementing the same semantics that way. It will NOT be pretty, and we need to devote some time to it to get it right. But it is the only practical way to solve this that I can see.

albbas commented 13 years ago

Comment 5176

Date: 2011-09-28 18:03:29 +0200 From: Trond Trosterud <>

For the future?

albbas commented 9 years ago

Comment 9661

Date: 2014-10-21 08:36:25 +0200 From: Trond Trosterud <>

New priority.

albbas commented 8 years ago

Comment 11266

Date: 2016-03-30 12:55:00 +0200 From: Thomas Omma <>

wiho how well this works in newest speller