Closed aarppe closed 4 years ago
@Madoshakalaka Do you have access to the ALTLab repo mentioned above?
@eddieantonio In order for us to make the most of the recent improvement showing inflectional categories etc., we need to update our Cree-to-English dictionary source to conform with the latest AEW coding, as is now fixed in the crkeng.xml
file.
@Madoshakalaka Do you have access to the ALTLab repo mentioned above?
@eddieantonio In order for us to make the most of the recent improvement showing inflectional categories etc., we need to update our Cree-to-English dictionary source to conform with the latest AEW coding, as is now fixed in the
crkeng.xml
file.
:+1: okay. Matt usually does the DB updates! But I'll see if I can do it.
@arppe: a few things to note in this version of crkeng.xml
:
<t>
has empty content in entry
<e>
<lg>
<l pos="N">ohpinikêwin</l>
<lc>NI-1</lc>
<stem>ohpinikêwin-</stem>
</lg>
<mg>
<tg xml:lang="eng">
<t pos="N" sources="MD" />
</tg>
</mg>
<mg>
<tg xml:lang="eng">
<t pos="N" sources="CW">weightlifting; act of lifting things</t>
</tg>
</mg>
</e>
There are 1078 (lemma, pos, ic) that the fst can not give any analyses. There are 173 (lemma, pos, ic) that do not have proper lemma analysis by fst There are 13 (lemma, pos, ic) that have ambiguous lemma analyses These words will be label 'as-is', meaning their lemmas are undetermined.
Thanks @Madoshakalaka for the implementing these diagnositc messages!
Done!
@eddieantonio Great! Was ohpinikêwin the only entry for which the <t>
field is missing?
Also, are the results of the diagnostics available somewhere? I.e. to check why some forms are not analyzed, or incorrectly analyzed?
@eddieantonio Great! Was ohpinikêwin the only entry for which the
<t>
field is missing?
It's the only one that the diagnostics reported, yes.
Also, are the results of the diagnostics available somewhere? I.e. to check why some forms are not analyzed, or incorrectly analyzed?
Nope :/ We could make that a think we log, but currently, the database is generated on our local machines, then pushed to Sapir.
@aarppe Everytime we rebuild the database, a detailed log of these diagnostics are recorded. I could send you one if you'd like. Part of the log looks like this:
2020-06-16 16:22:52,889 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mêstan-pîwayân with pos N ic NI-1 can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,889 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mêstâciwatêw with pos V ic VII-v can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,889 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mêsti- with pos Ipc ic IPV can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,890 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry micakisîs with pos N ic NI-1 have analyses by fst strict analyzer. Yet all analyses conflict with the pos/ic in xml file
2020-06-16 16:22:52,890 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry micimôtâw with pos V ic VTI-2 can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,891 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry miciyawêsiw with pos V ic VAI-v can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,891 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mihko- with pos Ipc ic IPV can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,891 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mihkopêmak with pos N ic NA-3 can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,892 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mihkowi- with pos Ipc ic IPN can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,892 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mihkwaskîwakâhk with pos N ic INM have analyses by fst strict analyzer. Yet all analyses conflict with the pos/ic in xml file
2020-06-16 16:22:52,892 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mihyawê- with pos Ipc ic IPV can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,892 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mihyawê- with pos Ipc ic IPN can not be analyzed by fst strict analyzer
2020-06-16 16:22:52,892 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mikisiwacîhk with pos N ic INM have analyses by fst strict analyzer. Yet all analyses conflict with the pos/ic in xml file
2020-06-16 16:22:52,893 — DatabaseManager.xml_entry_lemma_finder — DEBUG — xml entry mikisiwi- with pos Ipc ic IPN can not be analyzed by fst strict analyzer
Do you have access to the ALTLab repo mentioned above?
I tried the other day with @eddieantonio but there seems to be problems in ssh authentication. I'll figure things out and try to access that repo again
Added the missing English translation from MD for ohpinikêwin to crkeng.xml
- so that should be good in the subsequent iterations, until we have a more proper dictionary database. Some <stem>
fields still have unnecessary information, which would need to be removed (namely inflectional category codes).
Following up on the fix in #465, we would next need to import that into itwêwina for the new/corrected inflectional categories to take effect.
The new version can be found in the ALTLab GIT repo:
altlab/crk/dicts/crkeng.xml