UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
https://itwewina.altlab.app/
GNU General Public License v3.0
0 stars 3 forks source link

AECD: Add script to convert Alberta Elders' Cree Dictionary into JSON format #101

Open aarppe opened 2 years ago

aarppe commented 2 years ago

Take the content in the ALTLab version of AECD and convert that into the general JSON format, for eventual inclusion into the aggregated importjson format, after comparison with CW and MD.

aarppe commented 2 years ago

We should separate into distinct senses AECD definitions that contain the semi-colon as a delimiter.

There appear to be 1611 such cases on the crk2eng version of AECD, and an examination of random entries does suggest that the semicolon is used systematically to distinguish senses.

aarppe commented 2 years ago

In addition, there's some AECD-specific encoding, such as Alt. or Var. for alternative or variant Cree word forms, (Plains) and (Northern) indicating dialect, that we wouldn't want to include in the search. For the 731 Alt. cases and 180 Var. cases, we'd probably want to encode the Cree word-forms appropriately (something for Daniel), perhaps marking them with <crk>...</crk>.

aarppe commented 2 years ago

See also #106 for completion.