UAlbertaALTLab / morphodict

The Language Independent Intelligent Dictionary
https://morphodict.readthedocs.io/
Apache License 2.0
22 stars 11 forks source link

Update English phrase FOMA FSTs #1158

Open aarppe opened 1 year ago

aarppe commented 1 year ago

I've recompiled the English phrase analysis and generation FOMA FSTs, cf.

-rw-r--r--  1 arppe  staff  682821 20 Mar 18:32 src/transcriptions/transcriptor-cw-eng-noun-entry2inflected-phrase-w-flags.fomabin
-rw-r--r--  1 arppe  staff  599759 20 Mar 19:14 src/transcriptions/transcriptor-cw-eng-verb-entry2inflected-phrase-w-flags-and-templates.fomabin
-rw-r--r--  1 arppe  staff  613779 20 Mar 18:20 src/transcriptions/transcriptor-eng-phrase2crk-features.fomabin

... and am placing these in the designated subdirectory in our repo, in: ./morphodict/src/CreeDictionary/res/fst/

If pushing these to the repo won't work, these FOMA FSTs can be compiled with foma -l with the associates *.xfscript files in ./lang-crk/src/transcriptions/.

aarppe commented 1 year ago

I also uploaded these to our subrepo intended for large FSTs: https://github.com/UAlbertaALTLab/fst-exchange

aarppe commented 1 year ago

@nienna73 We might want to upload these, as using them should fix certain glitches in the English phrase translations with the original versions of the FOMABINs, presumably following the instructions here: https://github.com/UAlbertaALTLab/morphodict/tree/main/src/CreeDictionary/phrase_translate.

fbanados commented 4 months ago

This connects to work needed for #1166

fbanados commented 3 months ago

@aarppe clarification needed: The following are the FOMA FSTs actually in use in the code (link to line where they appear:)

If there is an intention to instead run the phrase-w-flags-and-templates FSTs instead, let me know.

aarppe commented 3 months ago

@fbanados The generator for English verb phrases used a new approach (templates), and was renamed accordingly (so the code reference should also be updated); the approaches for English noun phrase generation and general English phrase analysis didn't change, nor did the names, but there may have been modifications to the FOMABIN files. I thought I had updated and uploaded those three files, described above. Anyhow, the code mismatch might well explain why the English phrase FSTs are not fully working as intended.