UAlbertaALTLab / morphodict

Plains Cree Intelligent Dictionary
https://itwewina.altlab.app/
Apache License 2.0
22 stars 11 forks source link

Revised FST and DB sources for Gunáhà #1018

Closed aarppe closed 2 years ago

aarppe commented 2 years ago

First draft at next round of TVPD based content for Gunáhà, fixing glitches in previous version, starting with harmonizing choice of lemma for both LEXC and DB source.

codecov-commenter commented 2 years ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 79.08%. Comparing base (8f42df9) to head (da0375d). Report is 682 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1018 +/- ## ========================================== + Coverage 79.05% 79.08% +0.03% ========================================== Files 151 151 Lines 5294 5294 Branches 684 684 ========================================== + Hits 4185 4187 +2 Misses 984 984 + Partials 125 123 -2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aarppe commented 2 years ago

Nope, it's accidental, and I've revised the code creating the JSON source so that such duplicates would not happen - the challenge is a non-systematic implementation of the coding of argument classes in the XML source.

aarppe commented 2 years ago

The modified files should now be as good as I can create them programatically, even pending some confirmations of my interpretation of the XML source from Chris, so these should be releasable in the morphologically intelligent Gunáhà.

aarppe commented 2 years ago

@nienna73 @dwhieb I realize that we'll have to resolve in some fashion that the TVPD based JSON source (nor the word-list based FSTs) do not contain the example words in the test DB file (that were extracted from OS). The primary reason for this is that some of those verbs do not occur in TVPD, but that means that we do not in fact cover all paradigm types (at least the +O and +E cases).

When testing the DBs and FSTs locally, I've just excluded the old DB content with the --purge option - otherwise those currently obsolete entries will turn up in Gunáhà, which makes less sense if they're not really supported by the FST (which would be difficult with the word-list based approach).

aarppe commented 2 years ago

@nienna73 This is the Gunáhà PR with the updated DB and FST files.