For future reference, in order to standardize our process of creating a reverse frequency list of crk word-forms and their analyses and lemmas, which is used in ranking the relevance of the search results, there is now in the ALTLab repo the script: crk/bin/generate-a-w-b-wordform-lemma-anl-frequency-list.sh which does this, and can be used as follows:
crk/bin/generate-a-w-b-wordform-lemma-anl-frequency-list.sh crk/corpora ~/giellalt/lang-crk | less
The results can be stored into the following file in the ALTLab repo: crk/generated/ahenakew_wolfart_bloomfield.fst+cg.freq-sorted.txt
The script can be rerun whenever substantial changes have been implemented in the crk FST, or if we want to add other subcorpora than the currently included Ahenakew-Wolfart and Bloomfield texts.
For future reference, in order to standardize our process of creating a reverse frequency list of crk word-forms and their analyses and lemmas, which is used in ranking the relevance of the search results, there is now in the ALTLab repo the script:
crk/bin/generate-a-w-b-wordform-lemma-anl-frequency-list.sh
which does this, and can be used as follows:crk/bin/generate-a-w-b-wordform-lemma-anl-frequency-list.sh crk/corpora ~/giellalt/lang-crk | less
The results can be stored into the following file in the ALTLab repo:
crk/generated/ahenakew_wolfart_bloomfield.fst+cg.freq-sorted.txt
The script can be rerun whenever substantial changes have been implemented in the crk FST, or if we want to add other subcorpora than the currently included Ahenakew-Wolfart and Bloomfield texts.