giellalt / bugzilla-dummy

0 stars 0 forks source link

PlGen as first part and hyphenation: eeki-nommh accepted (Bugzilla Bug 915) #621

Closed albbas closed 7 years ago

albbas commented 13 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 915

Date: 2010-12-02T12:03:08+01:00 From: Thomas Omma <> To: Sjur Nørstebø Moshagen <> CC: maja.l.kappfjell, sjur.n.moshagen, thomas.omma, trond.trosterud

Last updated: 2017-03-03T12:05:32+01:00

albbas commented 13 years ago

Comment 3660

Date: 2010-12-02 12:03:08 +0100 From: Thomas Omma <>

Åarjelsaemien, version 1.0, 2010-11-30

Generated compounds with PlGen as first part are not accepted hyphened: nïejti-moere maanaj-gaerteni

Unhyphened they are fine

albbas commented 13 years ago

Comment 3661

Date: 2010-12-02 12:09:50 +0100 From: Thomas Omma <>

these two generated compound wit PlGen as first part are now accepted without hyphen: nïejtimoere maanajgaerteni

new problem is that this compound is accepted WITH hyphen: eeki-nommh

it should not be accepted at all - they have not compound-tagging to allow it

it seems like the hyphen makes the compound-tagging unvalid

Åarjelsaemien, version 1.0, 2010-12-01

albbas commented 13 years ago

Comment 3672

Date: 2010-12-03 09:56:54 +0100 From: Sjur Nørstebø Moshagen <>

Changing priority etc.

albbas commented 13 years ago

Comment 3676

Date: 2010-12-03 09:58:02 +0100 From: Sjur Nørstebø Moshagen <>

Forgot to change status.

albbas commented 13 years ago

Comment 4766

Date: 2011-08-08 12:05:35 +0200 From: Thomas Omma <>

eeki-nommh still accepted Åarjelsaemien, version 1.1, 2011-05-26

albbas commented 13 years ago

Comment 4974

Date: 2011-09-01 13:51:42 +0200 From: Thomas Omma <>

Åarjelsaemien, version 1.1, 20110830-45217

status same as in comment4

albbas commented 13 years ago

Comment 5117

Date: 2011-09-19 12:12:49 +0200 From: Tomi Pieski <>

Why is eeki-nommh incorrect? fst recognizes it:

eeki-nommh eeki-nommh eeke+N+PlGenCmp+Hyph#nomme+N+Pl+Nom

albbas commented 13 years ago

Comment 5118

Date: 2011-09-19 12:20:18 +0200 From: Thomas Omma <>

yes, sorry, it is Ok word, I change in regr-file

albbas commented 13 years ago

Comment 5121

Date: 2011-09-19 12:26:18 +0200 From: Thomas Omma <>

doeuble-sorry

eeki-nommh And eekinommh are incorrect, even though fst recognizes them. This because of compounding-tags (they have none):

eeke:eek NIEJTE ; nomme:nomm NIEJTE ;

speller accepts eeki-nommh But not eekinommh

It must be due to the hyphen

albbas commented 12 years ago

Comment 6444

Date: 2012-06-18 14:56:19 +0200 From: Maja Lisa Kappfjell <>

Interesant! test!

albbas commented 12 years ago

Comment 6446

Date: 2012-06-18 14:58:43 +0200 From: Maja Lisa Kappfjell <>

Interesant! test!

albbas commented 7 years ago

Comment 11737

Date: 2016-11-28 10:34:28 +0100 From: Sjur Nørstebø Moshagen <>

Revisiting old bugs:

This bug is not related to PLX per se, but to the interpretation of the compounding tags. The bug is still with us if it is incorrect that the hyphenated form should NOT be accepted:

$ echo eeki-nommh | hfst-lookup -q src/analyser-gt-norm.hfstol eeki-nommh eeke+N+Cmp-#nomme+Num+Pl+Nom 10,000000

$ echo eekinommh | hfst-lookup -q src/analyser-gt-norm.hfstol eekinommh eekinommh+? inf

$ echo eeki-nommh | hfst-ospell -S tools/spellcheckers/fstbased/desktop/hfst/sma.zhfst "eeki-nommh" is in the lexicon... $ echo eekinommh | hfst-ospell -S tools/spellcheckers/fstbased/desktop/hfst/sma.zhfst "eekinommh" is in the lexicon...

$ echo eeki-nommh | hfst-lookup -q tools/spellcheckers/fstbased/desktop/analyser-desktopspeller-gt-norm.hfst eeki-nommh eeke+N+Cmp-#nomme+Num+Pl+Nom 24,495081 eeki-nommh eeke+N+Cmp/Hyph+Cmp#nomme+N+Pl+Nom 10024,495117

$ echo eekinommh | hfst-lookup -q tools/spellcheckers/fstbased/desktop/analyser-desktopspeller-gt-norm.hfst eekinommh eeke+N+Cmp#nomme+N+Pl+Nom 24,495081

It is also quite disturbing that the normative analyser and the speller behave differently.

The lexc entries for the words are:

eeke+Sem/Dummytag:eek NIEJTE ; nomme+Sem/Dummytag:nomm NIEJTE ;

meaning that eeke should only allow compounding in SgNom, and that nomme is not overriding it with a LeftCmp tag.

I will take over this bug for now.

albbas commented 7 years ago

Comment 12062

Date: 2017-03-02 09:40:24 +0100 From: Sjur Nørstebø Moshagen <>

Slight progress - the speller and the normative analyser now behaves the same:

$ echo eeki-nommh | hfst-lookup -q src/analyser-gt-norm.hfstol eeki-nommh eeke+N+Cmp-#nomme+Num+Pl+Nom 10,000000

$ echo eekinommh | hfst-lookup -q src/analyser-gt-norm.hfstol eekinommh eekinommh+? inf

$ echo eekinommh | hfst-lookup -q src/analyser-gt-desc.hfstol eekinommh eeke+N+Cmp/PlGen+Cmp#nomme+N+Pl+Nom 10,000000

$ echo eeki-nommh | hfst-ospell -S tools/spellcheckers/fstbased/desktop/hfst/sma.zhfst "eeki-nommh" is in the lexicon...

$ echo eekinommh | hfst-ospell -S tools/spellcheckers/fstbased/desktop/hfst/sma.zhfst "eekinommh" is NOT in the lexicon:

$ echo eeki-nommh | hfst-lookup -q tools/spellcheckers/fstbased/desktop/analyser-desktopspeller-gt-norm.hfst eeki-nommh eeke+N+Cmp-#nomme+Num+Pl+Nom 24,495081

$ echo eekinommh | hfst-lookup -q tools/spellcheckers/fstbased/desktop/analyser-desktopspeller-gt-norm.hfst eekinommh eekinommh+? inf

albbas commented 7 years ago

Comment 12098

Date: 2017-03-03 12:05:32 +0100 From: Sjur Nørstebø Moshagen <>

This specific compound is accepted with a hyphen because the last part is a numeral:

(In reply to Sjur Nørstebø Moshagen from comment #12)

Slight progress - the speller and the normative analyser now behaves the same:

$ echo eeki-nommh | hfst-lookup -q src/analyser-gt-norm.hfstol eeki-nommh eeke+N+Cmp-#nomme+Num+Pl+Nom 10,000000

Compounding with numerals require a hyphen (even when spelled out with letters - that is something that could be revisited at some time). The numeral 'nomme' can be inflected, and the end result is the analysis above (and thus the observed speller behavior).

That is, compounding and compounding restriction using tags is working as it should, and this is not a real bug.

If we want to restrict compounds with -nomme then that is another discussion and outside this bug report.

Closed as fixed (due to the originally reported words were fixed way back in 2010).