The ALTLab repo now has a revised version of entry-specific aggregated and individual morpheme log-frequencies (along with morpheme counts), which is available in: crk/generated/CW_aggregate_morpheme_log_freqs.tsv
This was created with the script: crk/bin/extract-morpheme-frequencies.sh with the following command:
Note that entries that occur only in MD (or any other dictionary) will not get ranked - for those entries we need to come up with some default strategy, perhaps based on character length using corresponding mean weights based on CW entry weights, or something else.
With this, we should now have all the components that the linguists can bring to the table for updating and revising the relevance ranking of the search results. Note that the corpus-based form/lemma frequencies are to be found in the ALTLab repo here:
We may want to consider whether the survey results ought to be used for specifying core vocabulary. And we will need to implement POS-matching in particular between the results of English search phrase analysis and the dictionary entries.
The ALTLab repo now has a revised version of entry-specific aggregated and individual morpheme log-frequencies (along with morpheme counts), which is available in:
crk/generated/CW_aggregate_morpheme_log_freqs.tsv
This was created with the script:
crk/bin/extract-morpheme-frequencies.sh
with the following command:crk/bin/extract-morpheme-frequencies.sh ../PlainsLexUni/CreeDict-x > crk/generated/CW_aggregate_morpheme_log_freqs.tsv
Note that entries that occur only in MD (or any other dictionary) will not get ranked - for those entries we need to come up with some default strategy, perhaps based on character length using corresponding mean weights based on CW entry weights, or something else.
With this, we should now have all the components that the linguists can bring to the table for updating and revising the relevance ranking of the search results. Note that the corpus-based form/lemma frequencies are to be found in the ALTLab repo here:
crk/generated/ahenakew_wolfart_bloomfield.fst+cg.freq-sorted.txt
We may want to consider whether the survey results ought to be used for specifying core vocabulary. And we will need to implement POS-matching in particular between the results of English search phrase analysis and the dictionary entries.