giellalt / lang-kpv

Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Komi-Zyrian language
https://giellalt.uit.no
GNU Lesser General Public License v3.0
8 stars 0 forks source link

OOo Latin vs. ukom, why don't they recognize the same things? ( #4

Closed albbas closed 11 years ago

albbas commented 12 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 1267

Date: 2012-01-28T14:23:29+01:00 From: Jack Rueter <> To: Sjur Nørstebø Moshagen <> CC: ciprian.gerstenberger, rueter.jack, trond.trosterud

Last updated: 2012-12-12T10:21:09+01:00

albbas commented 12 years ago

Comment 5669

Date: 2012-01-28 14:23:29 +0100 From: Jack Rueter <>

A very interesting thing occurred in the Komi spell-checker results. First, the fact that we are now able to recognized word forms with upper-casing of the initial letter has brought a great improvement to catching short-comings of our description. Thank you!

Why does the Komi word "черинянь" '(bread with fish baked in it)' get accepted by ukom but not the speller?

The form "черинянь" is given in the database gtsvn/kt/kom/src/working_files/N_kom-lex.xml it shows up again after make lexfiles: kom-lex-xmlsrc.txt:черинянь Noun1 "" ;

and after make it can be analyzed by ukom

src jackrueter$ ukom

черинянь черинянь черинянь+N+Sg+Nom

But make -f Makefile.hfst does not produce a spell checker that accepts it?

albbas commented 12 years ago

Comment 5670

Date: 2012-01-28 14:48:13 +0100 From: Ciprian Gerstenberger <>

This is rather something for Sjur.

albbas commented 12 years ago

Comment 5677

Date: 2012-01-29 09:51:45 +0100 From: Jack Rueter <>

It seems that this same problem is to be found in abbreviations: There are two sources for abbreviations:

(1) gtsvn/kt/kom/src/working_files/ABBR_kom-lex.xml

с.в. (сідз водзӧ 'etc.') лб. (листбок 'page')

Both are accepted by ukom but neither is accepted by the speller.

(2) gtsvn/kt/kom/src/abbr-kom-lex.txt

gtsvn/kt/kom/src/acro-kom-lex.txt

БОСЬТӦС (no such word exists)

In ukom this form is not accepted because it is more than 5 letters long. The speller, however, seems to accept any length of acronym candidates provided they are all upper-case.

Hence the speller accepts non-existent words if they are upper-case, but it does not accept the lower-case abbreviations in the database.

albbas commented 12 years ago

Comment 5762

Date: 2012-02-07 22:37:29 +0100 From: Sjur Nørstebø Moshagen <>

I will look into this when time permits. Things are changing all the time when it comes to HFST and related technologies. This bug might solve itself over time.

albbas commented 11 years ago

Comment 7532

Date: 2012-12-12 09:51:14 +0100 From: Trond Trosterud <>

Jaska, what is the status of this one?

albbas commented 11 years ago

Comment 7536

Date: 2012-12-12 10:21:09 +0100 From: Jack Rueter <>

(In reply to comment #4)

Jaska, what is the status of this one?

This is no longer a problem.