UAlbertaALTLab / morphodict

Plains Cree Intelligent Dictionary
https://itwewina.altlab.app/
Apache License 2.0
22 stars 11 forks source link

Create a list of shared vocabulary in glossaries #1038

Closed aarppe closed 2 years ago

aarppe commented 2 years ago

In order to improve the results of itwêwina search, one dimension would be whether a potential search target occurs in any or all of the glossaries for introductory Cree text books (Okimâsis and Ratt) or courses (NS152).

That is now available in the ALTLab repo, in: crk/generated/crk_glossaries_aggregate_vocab.tsv

This file was created with the script: crk-aggregate-core-vocab.sh, as follows (when run at the root directory of the ALTLab repo:

crk/bin/crk-aggregate-core-vocab.sh /Users/arppe/altlab2/crk/ | sort -k1,1nr -k2,2 > crk/generated/crk_glossaries_aggregate_vocab.tsv

The incorporation of this file, alongside the corpus-based lemma counts (that we have on the basis of the A-W and B corpora), as well as the dictionary-based mean morpheme frequencies (which needs to be updates as well), should next be considered.

aarppe commented 2 years ago

A first version of this has been created as of Saturday, Dec. 18, 2021.