UAlbertaALTLab / crk-db

Managing the Plains Cree dictionary database
https://itwewina.altlab.app/
GNU General Public License v3.0
0 stars 3 forks source link

AECD atim not merged with CW and MD #128

Open aarppe opened 1 month ago

aarppe commented 1 month ago

Originally posted by @fbanados in https://github.com/UAlbertaALTLab/crk-db/issues/117#issuecomment-2249058056

This presents another issue, namely that AECD atim is not merged with atim in CW and MD, cf.

Screenshot 2024-07-24 at 5 26 36 PM

The article a in AECD should be ignored as a content work. But does this have something to do with the inflectional class not entirely matching, i.e. NA vs. NA-3?

fbanados commented 1 month ago

It is partly that, as the effect reflects in other entries like in this comment to #125.

fbanados commented 1 month ago

Issue fixed for some entries: Screenshot 2024-07-25 at 4 35 26 PM Screenshot 2024-07-25 at 4 34 45 PM

Screenshot 2024-07-25 at 4 38 25 PM

However, still some others remain unmerged, because of decisions made previously.

For example, multiple senses from CW generate different entries, like for âhkosiw. This is the current behaviour (https://itwewina.altlab.app/search?q=âhkosiw) Because there's two entries already for âhkosiw from CW, we cannot programmatically merge the entries from MD and AECD for it, which generates therefore a lot of entries:

Screenshot 2024-07-25 at 4 42 07 PM

We could choose to merge these, but that is a linguist decision. My programmer gut thinks we could merge all entries that have the same FST analysis, but that means we would end up with entries (like âhkosiw) with many senses. This will require more work and linguist decisions, so I'm stopping merging where it is and purposedfully keeping all those entries for âhkosiw unless there is an urgent need against it.

aarppe commented 1 month ago

This would seem to concern entries for which the heads are structurally the same, but the meaning so different that they are no longer senses under one entry but multiple entries, both types of meanings which are apparent in the above, and distinctions that we'd probably want to keep.

This would seem to be a case where the linguist has to manually select which of the two (or more entries) in CW and entry in MD or AECD matches to (if at all). So, we'd need to have a (new?) field for making that selection, and we'd want to do that in a way that is future-proof, to the extent possible (when AEW may modify something).