UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
GNU General Public License v3.0
0 stars 3 forks source link

Example cases of multi-source hierarchical aggregation #99

Open aarppe opened 2 years ago

aarppe commented 2 years ago

Here's a list of selected CW, AE, and MD dictionary entries, and interpretations of how they should be split to senses (based on the semicolon for CW) and how these senses should be aggregated using the hierarchical procedure with CW > AE > MD. What turned out in this only limited scrutiny is that the AE entries are surprising aberrant (wrong part-of-speech, apparent duplicates), which might require more manual fixing than I'd want.

apiw ᐊᐱᐤ VAI s/he sits, s/he sits down, s/he is present; s/he is available; s/he is there, s/he is situated; s/he is at home, s/he stays at home (CW)
apiw ᐊᐱᐤ VTA s/he is sitting (AE) 
apiw ᐊᐱᐤ V PHRASE He sits. Also means he is at home. (MD)
  1. s/he sits, s/he sits down, s/he is present (CW)
  2. s/he is available; s/he is there, s/he is situated (CW)
  3. s/he is at home, s/he stays at home (CW)
  4. s/he is sitting (AE)
  5. He sits. Also means he is at home. (MD)
apoy ᐊᐳᕀ NA paddle; shovel, spade (CW)
apoy ᐊᐳᕀ NA paddle (AE)
apoy ᐊᐳᕀ N A shovel. (MD)
  1. paddle (CW, AE)
  2. shovel, spade (CW)
  3. A shovel (MD)
asikan ᐊᓯᑲᐣ NA sock, stocking (CW) 
asikan pl. asikanak ᐊᓯᑲᐣ NA A sock; stocking. (AE)
asikan pl. asikanak ᐊᓯᑲᐣ N Sock. (MD)
  1. sock, stocking (CW)
  2. A sock (AE, MD)
  3. stocking (AE)
askihk ᐊᐢᑭᕁ NA pail; kettle (CW)
askihk pl. askihkwak ᐊᐢᑭᕁ NA A pail; a bucket. (AE)
askihk ᐊᐢᑭᕁ N A pail. (MD)
  1. pail (CW, AE, MD)
  2. kettle (CW, AE)
askîhk ᐊᐢᑮᕁ INM on the land; reserve (CW) 
askihk ᐊᐢᑭᕁ LOC On, in or of the earth or land. (MD)
  1. on the land (CW)
  2. reserve (CW)
  3. On, in or of the earth or land. (MD)
astotin ᐊᐢᑐᑎᐣ NI hat, cap, headgear (CW)
astotin pl. astotina ᐊᐢᑐᑎᐣ NI A hat. (AE) 
astotin ᐊᐢᑐᑎᐣ NI hat (EC)
  1. hat, cap, headgear (CW)
  2. A hat. (AE, MD)
atoskêw ᐊᑐᐢᑫᐤ VAI s/he works (CW) 
atoskew ᐊᑐᐢᑫᐤ V PHRASE He works. (MD) 
  1. s/he works (CW, MD)
wâpahtam ᐋᐧᐸᐦᑕᒼ VTI s/he sees s.t., s/he witnesses s.t. (CW) 
wapahtam ᐊᐧᐸᐦᑕᒼ VAI observe (EC) 
wapahtam ᐊᐧᐸᐦᑕᒼ VP He sees it. (MD) 
  1. s/he sees s.t., s/he witnesses s.t. (CW)
  2. observe (EC)
  3. He sees it. (MD)
mowêw ᒧᐁᐧᐤ VTA s/he eats s.o. (e.g. bread) (CW)
mowew pl. mowewak ᒧᐁᐧᐤ VTA S/he eats them. (AE)
mowew ᒧᐁᐧᐤ VP He eats him. (MD)
  1. s/he eats s.o. (e.g. bread) (CW, MD, AE) [This presumes the regularization of 'them' as 'him'/'s.o.'
itwêw ᐃᑌᐧᐤ VAI s/he says so, s/he says thus, s/he calls (it) so; it has such a meaning (CW) 
itwew ᐃᑌᐧᐤ VAI s/he has stated (AE)
itwew ᐃᑌᐧᐤ V He says. (MD)
  1. s/he says so, s/he says thus, s/he calls (it) so (CW)
  2. it has such a meaning (CW)
  3. s/he has stated (AE)
  4. He says (MD)
itêw ᐃᑌᐤ VTA s/he says thus to s.o., s/he says thus about s.o.; s/he calls s.o. thus (CW)
itew ᐃᑌᐤ V That's what he says to him. (MD)
  1. s/he says thus to s.o., s/he says thus about s.o. (CW)
  2. s/he calls s.o. thus (CW)
  3. That's what he says to him. (MD)
itam ᐃᑕᒼ VTI s/he says thus to or about s.t.; s/he calls s.t. thus (CW)
itam ᐃᑕᒼ V He speaks of it as so. (MD) 
  1. s/he says thus to or about s.t. (CW)
  2. s/he calls s.t. thus (CW)
  3. He speaks of it as so. (MD)
wâpiw ᐋᐧᐱᐤ VAI s/he sees, s/he has vision; (CW)
wâpiw pl. wâpiwak ᐋᐧᐱᐤ VAI S/he sees. (AE) 
  1. s/he sees, s/he has vision (CW)
  2. S/he sees. (AE)
aarppe commented 2 years ago

@dwhieb Here's a batch of example cases for testing the multi-source aggregation.

fbanados commented 1 day ago

Reopening as these are good examples where current multi-source aggregation may not be currently doing the right thing.

aarppe commented 1 day ago

We might want to implement some form of standardization of the ALTLab versions of the definitions from MD and AECD. For instance, removing the initial article, i.e. 'a, an, the', and lower-casing initial pronouns (e.g. S/he -> s/he in AECD , perhaps even generalizing the masculine He in MD to s/heas in CW. We might do an initial pass of this programmatically, and always keeping the original definition for reference, but then having a standardized version for public consumption in itwêwina. This would merit an issue of its own (#121).