UAlbertaALTLab / morphodict

The Language Independent Intelligent Dictionary
https://morphodict.readthedocs.io/
Apache License 2.0
23 stars 11 forks source link

Fix glitches in "inflected" English phrase translation #682

Open aarppe opened 3 years ago

aarppe commented 3 years ago

The English phrase generation of some forms does not work for some of the English definitions, which needs to be fixed in the generator FSTs:

Verbs:

Nouns

aarppe commented 3 years ago

In addition, there appear to be some extra-FST glitches:

aarppe commented 3 years ago

Once the above matters are resolved, we go down from 512,210 non-generated forms to only some 43,493 missing ones, cf.

cat inc/phrases/verbs.phrases | grep '+?' | grep -v 'him/herself' | grep -v '(s.o. ' | egrep -v '\<it\>' | grep -v 12 | grep -v 'he/she' | grep -v 4 | wc -l        
   43493
aarppe commented 3 years ago

For noun phrases, most/many are not properly constructed with an initial feature, e.g.

cat inc/phrases/nouns.phrases| grep '+?' | head -10
 Piegan country, in the Piegan country  +?
 small piece of cloth, scrap    +?
 domestic animal    +?
 shorts; underwear  +?
 crab; lobster  +?
 birthday cake  +?
 my vagina, my vulva    +?
 cucumber; literally: our deceased grandmother  +?
 Shoal Lake Cree Nation, SK; Cree reserve   +?
 intestine

The rest appear to be cases with diacritic characters, which now ought to be fixed.