UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
https://itwewina.altlab.app/
GNU General Public License v3.0
0 stars 3 forks source link

improve matching algorithm #19

Open dwhieb opened 3 years ago

dwhieb commented 3 years ago

The matching algorithm already looks at the MD > CW mappings to match entries. Improve this algorithm by using other factors as well:

Examples

Cree: Words

ayâw  VII-v  it is, it is there                                          /ay-/ + /-â/  ayâ-
ayâw  VTI-2  s/he has s.t., s/he owns s.t.                               /ay-/ + /-â/  ayâ-
ayâw  VAI-v  s/he is, s/he is there; s/he lives there, s/he stays there  /ay-/ + /-â/  ayâ-

Maskwacîs Dictionary

ayaw  He owns: he has.                                                 finance; have_wealth; own_possess
ayaw  He is here/there.  (Animate)  e.g. Ekota ki ayaw. He was there.  here_there; location
dwhieb commented 3 years ago

R code for guessing at the inflectional class of an MD entry based on the definition:

# VTA:

MD$LC[intersect(grep("^\\<(([S]he)|([H]e)|([T]hey))\\>[^\\.]+\\<((him)|(her)|(them))\\>",MD$MeaningInEnglish), grep("Inanimate",MD$MeaningInEnglish, invert=T))] <- "VTA"

# VTI:

MD$LC[intersect(grep("^\\<(([S]he)|([H]e)|([T]hey))\\>[^\\.]+\\<((him)|(her)|(them))\\>",MD$MeaningInEnglish), grep("Inanimate",MD$MeaningInEnglish))] <- "VTI"

MD$LC[intersect(grep("^\\<(([S]he)|([H]e)|([T]hey))\\>[^\\.]+\\<((it)|(them))\\>",MD$MeaningInEnglish, value=F), which(is.na(MD$LC)))] <- "VTI"

# Fixing individual cases:
MD$LC[which(MD$MeaningInEnglish=="He cuts the legs off (a table) something inanimate.")] <- "VTI"
MD$LC[which(MD$MeaningInEnglish=="He holds the two together. Inanimate.")] <- "VTI"

# VAI:

# Reciprocals
MD$LC[intersect(grep("^\\<(([S]he)|([H]e)|([T]hey))\\>[^\\.]+\\<((himself)|(herself)|(themselves))\\>",MD$MeaningInEnglish, value=F), which(is.na(MD$LC)))] <- "VAI"

# Correction:
MD$LC[grep("^\\<(([S]he)|([H]e)|([T]hey))\\>[^\\.]+\\<((itself))\\>",MD$MeaningInEnglish, value=F)] <- "VAI"

MD$LC[intersect(grep("^\\<(([S]he)|([H]e)|([T]hey))\\>",MD$MeaningInEnglish, value=F), which(is.na(MD$LC)))] <- "VAI"

# VII:
MD$LC[intersect(intersect(grep("^\\<(([Ii]t)|([T]hey))\\>",MD$MeaningInEnglish, value=F), which(is.na(MD$LC))), grep("^v",MD$POS))] <- "VII"