UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
https://itwewina.altlab.app/
GNU General Public License v3.0
0 stars 3 forks source link

Incorporate corpus/lemma and dictionary/morpheme frequencies #51

Open aarppe opened 3 years ago

aarppe commented 3 years ago

To replace the current file: ~/giella/art/dicts/crk/Wolvengrey/W_aggr_corp_morph_log_freq.txt, with the process described here:

https://github.com/UAlbertaALTLab/cree-intelligent-dictionary/issues/163

... we'd want to implement the incorporation of comparable information with our aggregate dictionary database.

Based on the materials we have for Cree, I'd presume one or more corpus-based frequencies (not only Ahenakew-Wolfart but also Bloomfield), as well as a dictionary/morpheme-based ranking, which might be corpus-weighted as well. So these would seem features to be added to the aggregate dictionary entries.

aarppe commented 2 years ago

These now exist as a result of https://github.com/UAlbertaALTLab/morphodict/issues/1041, and keeping that information separate might be the better option. Nevertheless, we'd want to ensure that the frequencies can be matched with the dictionary content - I believe this can be achieved with a combination of the lemma and the specific word-class.

@dwhieb In this respect, I believe this issue could be closed.