Open eddieantonio opened 5 years ago
Looks like the reason for this is that the English gloss from CW was not extracted for 741 cases (even though a classification was done presumably based on consulting the CW source), in many cases likely due to a mismatch with a non-regularized <ý>
- at least 638 of the Cree glosses have a <y>
. This can be fixed semi-automatically, I think.
Brief scripting suggests the missing CW gloss may become solved in 571 cases. This automatic insertion likely needs to be manually verified.
I've revised the comparison source file by adding the missing cases when they could be unequivocally found in the CW source (there were only a few missing MD cases, and these were due to incorrect formatting, such as a missing tab).
But there's yet over a 100 missing CW glosses that need to be extracted by hand.
I've revised the comparison source file by adding the missing cases when they could be unequivocally found in the CW source (there were only a few missing MD cases, and these were due to incorrect formatting, such as a missing tab).
But there's yet over a 100 missing CW glosses that need to be extracted by hand.
What do you mean "extracted by hand"?
Human verification/analysis. These are cases where I wasn't able to extract from CW source the lemma automatically based on the information that is in the MD vs. CW comparison file. They may be cases where the comparison is done between a lemma and an inflected form (conjugation). So this may require human judgement/verification as to what is the correct CW gloss/lemma match, and whether the CW gloss is actually missing or not.
And then a bunch are dependent nouns where the FST analysis has corrupted the comparison file when it has been edited in a spreadsheet (with the previous convention of these dependent nouns having a stem as their lemma marked with an initial hyphen, which the spreadsheet has tried to interpret as a reference, and when that hasn't worked it has been automatically marked by the spreadsheet as #NAME?). So one cannot use the FST analysis field automatically to extract the CW gloss. Based on file history this has happened before last summer (so anytime between then and when the comparison project files were created in August 2016).
Adding to this: "ocêkatâhk"/"Big Dipper" is not present in the engcrk.xml
, but it is present in crkeng.xml
.
Some entries have an empty
<tg>
which gets rendered as an empty list item in itwêwina:e.g., âhkamêyihtamowin
which is rendered as:
Situations like these can be caught in #94; additionally, itwêwina could refuse to display empty translation groups.