Open aarppe opened 1 year ago
There's now a shell script that does in one go all the above steps: altlab/crk/bin/update-crk-dictionary-sources-2-lexc.sh
.
@M1Al3x The process outlined above to update the LEXC source, and thus the FSTs, needs to be done first, when incorporating updated dictionary content into itwêwina, before these same dictionary sources are processed into *.importjson for uploading into itwêwina. Thus, the steps are:
@M1Al3x This issue describes steps 1-2 above. The front page Readme.Md
has the description for steps 4-5 above.
Following are the individual steps needed to update the LEXC source that will be used for the itwêwina (and other) FSTs (for which the compilation is outlined in #109).
Update Cree Words (CW) source file
CreeDict-x
in Carleton reposvn up
Remove Windows-style CR characters from CW source, and copy this over to ALTLab repo
cat PlainsLexUni/CreeDict-x | tr -d '\r' > altlab/crk/dicts/Wolvengrey_altlab.toolbox
Convert this Toolbox file into TSV format:
cat altlab/crk/dicts/Wolvengrey_altlab.toolbox | altlab/crk/bin/toolbox2tsv.sh > altlab/crk/generated/Wolvengrey_altlab.tsv
Compare against Maskwacîs Dictionary content, and add unique entries (and associated stem and inflectional class information) after the CW entries:
altlab/crk/bin/add-md-entries-2-after-cw-tsv.sh altlab/crk/generated/Wolvengrey_altlab.tsv altlab/crk/dicts/Maskwacis_altlab.tsv > altlab/crk/generated/altlab.tsv
Generate LEXC source for individual parts-of-speech from this ALTLab aggregated TSV file:
cat altlab/crk/generated/altlab.tsv | altlab/crk/bin/altlab2lexc.sh 'N' > altlab/crk/generated/noun_stems.lexc
cat altlab/crk/generated/altlab.tsv | altlab/crk/bin/altlab2lexc.sh 'V' > altlab/crk/generated/verb_stems.lexc
Add copyright headers to LEXC sources, and copy over
giellalt/lang-crk/src/fst/morphology/stems/
cat giellalt/lang-crk/src/fst/morphology/stems/noun_header.lexc altlab/crk/generated/noun_stems.lexc > giellalt/lang-crk/src/fst/morphology/stems/noun_stems.lexc
cat giellalt/lang-crk/src/fst/morphology/stems/verb_header.lexc altlab/crk/generated/verb_stems.lexc > giellalt/lang-crk/src/fst/morphology/stems/verb_stems.lexc