giellalt / bugzilla-dummy

0 stars 0 forks source link

Cyrillic-based languages are not properly case-handled by Voikko (Bugzilla Bug 1256) #1675

Closed albbas closed 12 years ago

albbas commented 12 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 1256

Date: 2012-01-19T08:11:37+01:00 From: Trond Trosterud <> To: Sjur Nørstebø Moshagen <> CC: borre.gaup, tomi.k.pieski

Last updated: 2012-02-03T16:25:46+01:00

albbas commented 12 years ago

Comment 5617

Date: 2012-01-19 08:11:37 +0100 From: Trond Trosterud <>

Background: The Komi-as-Latin OOo speller works fine, but cannot handle uppercasing of lowercase words.

(I do not find the reference, but) it was suggested to exchange the quasicode "la" for a cyrillic-based language. I did that, twice, for kk and be (Kazakh and Belorussian), in the Makefile.hfst:

kv is presently replaced with be (the language code for Belarussian)

to allow testing in OOo, which does not yet have support for Komi

2LCODE=be

and compiled, and generated new files in .voikko/2

~/.voikko/2$cat mor-be/voikko-fi_FI.pro info: Voikko-Dictionary-Format: 2 info: Language-Code: be info: Language-Variant: standard info: Description: Kokeellinen komi morfologia info: Morphology-Backend: null info: Speller-Backend: hfst info: Suggestion-Backend: hfst ~/.voikko/2$

Now, the problem is: Neither kazakh nor Belarussian come up with the promising green V in OpenOffice after restart.

So, have I overlooked something?

This really should be in place this week, I go to Russia next week to install this.

albbas commented 12 years ago

Comment 5619

Date: 2012-01-19 23:27:27 +0100 From: Trond Trosterud <>

The testing goes on: upper/lower works for faroese in OOo. It does not work for Komi-as-Ukrainian, even though uk is on voikko's list It also does not work for Kildin Sámi (to repeat test, take the nouns from our noun file and test the lemma form)

Thus: The problem is linked to all our cyrillic spellers, and not to the latin ones. It is not dependent upon whether the lg in question is an OOo lg or not (cf. sjd, uk).

It seems this is an issue concerning the voikko engine, which has no cyrillic casinp pairs added.

TODO: Find out and add the cyrillic pairs.

albbas commented 12 years ago

Comment 5709

Date: 2012-02-03 16:25:46 +0100 From: Sjur Nørstebø Moshagen <>

This is fixed now: