UAlbertaALTLab / itwewina

Replaced by https://github.com/UAlbertaALTLab/cree-intelligent-dictionary
https://github.com/UAlbertaALTLab/cree-intelligent-dictionary
GNU General Public License v3.0
1 stars 0 forks source link

Syllabic-to-SRO crk analyser/generator FSTs do not seem to work properly #38

Open aarppe opened 6 years ago

aarppe commented 6 years ago

When trying to analyze the correct form, ᐁ ᐚᐸᒫᐟ, we get 'inf'.

hfst-lookup -q src/analyser-gt-desc.Cans.hfstol ᐁ ᐚᐸᒫᐟ ᐁ ᐚᐸᒫᐟ ᐁ ᐚᐸᒫᐟ+? inf

But trying the incorrect form with a m-a-t final syllable, we do get an analysis:

ᐁ ᐚᐸᒼᐊᐟ ᐁ ᐚᐸᒼᐊᐟ PV/e+wâpamêw+V+TA+Cnj+Prs+2Sg+3SgO 0.000000 ᐁ ᐚᐸᒼᐊᐟ PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO 0.000000

With the Cans-to-Cans analyser, we get the appropriate results:

hfst-lookup -q src/analyser-gt-desc.Cans-to-Cans.hfstol ᐁ ᐚᐸᒫᐟ ᐁ ᐚᐸᒫᐟ PV/e+ᐚᐸᒣᐤ+V+TA+Cnj+Prs+2Sg+3SgO 0.000000 ᐁ ᐚᐸᒫᐟ PV/e+ᐚᐸᒣᐤ+V+TA+Cnj+Prs+3Sg+4Sg/PlO 0.000000

... also with the incorrect spelling:

ᐁ ᐚᐸᒼᐊᐟ ᐁ ᐚᐸᒼᐊᐟ PV/e+ᐚᐸᒣᐤ+V+TA+Cnj+Prs+2Sg+3SgO 0.000000 ᐁ ᐚᐸᒼᐊᐟ PV/e+ᐚᐸᒣᐤ+V+TA+Cnj+Prs+3Sg+4Sg/PlO 0.000000

The SRO-to-Cans normative generator also produces the incorrect form with the m-a-t final syllable:

hfst-lookup -q src/generator-gt-norm.Cans.hfstol PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO ᐁ ᐚᐸᒼᐋᐟ 0.000000

... but the Cans-to-Cans normative generator produces the correct form:

hfst-lookup -q src/generator-gt-norm.Cans-to-Cans.hfstol PV/e+ᐚᐸᒣᐤ+V+TA+Cnj+Prs+3Sg+4Sg/PlO PV/e+ᐚᐸᒣᐤ+V+TA+Cnj+Prs+3Sg+4Sg/PlO ᐁ ᐚᐸᒫᐟ 0.000000

N.B. itwêwina seems to work properly.

aarppe commented 6 years ago

First analysis: Looks like there's an error in the Makefile that specifies how the Cans-to-Latn and Latn-to-Cans FSTs compiled, as manual composition below produces correct analysis and generation results.

hfst-compose -F -1 src/orthography/Cans-to-Latn.compose.hfst -2 src/analyser-gt-desc.hfst -o analyser-gt-desc.Cans-to-Latn.hfst

hfst-lookup -q analyser-gt-desc.Cans-to-Latn.hfst ᐁ ᐚᐸᒫᐟ ᐁ ᐚᐸᒫᐟ PV/e+wâpamêw+V+TA+Cnj+Prs+2Sg+3SgO 0.000000 ᐁ ᐚᐸᒫᐟ PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO 0.000000

hfst-compose -F -1 src/generator-gt-norm.hfst -2 src/orthography/Latn-to-Cans.compose.hfst -o generator-gt-norm.Latn-to-Cans.hfst

hfst-lookup -q generator-gt-norm.Latn-to-Cans.hfst PV/e+wâpamêw+V+TA+Cnj+Prs+2Sg+3SgO PV/e+wâpamêw+V+TA+Cnj+Prs+2Sg+3SgO ᐁ ᐚᐸᒪᐟ 0.000000

PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO PV/e+wâpamêw+V+TA+Cnj+Prs+3Sg+4Sg/PlO ᐁ ᐚᐸᒫᐟ 0.000000

aarppe commented 6 years ago

Sent query to Sjur about whether this is an error in any of the Makefile's.

aarppe commented 6 years ago

This can be solved for the moment by specifying the required FSTs ourselves, but on a longer term we should have their compilation defined with the GT infra (but this requires input from Sjur).