Open aarppe opened 1 month ago
It is partly that, as the effect reflects in other entries like in this comment to #125.
Issue fixed for some entries:
However, still some others remain unmerged, because of decisions made previously.
For example, multiple senses from CW generate different entries, like for âhkosiw
. This is the current behaviour (https://itwewina.altlab.app/search?q=âhkosiw) Because there's two entries already for âhkosiw
from CW, we cannot programmatically merge the entries from MD and AECD for it, which generates therefore a lot of entries:
We could choose to merge these, but that is a linguist decision. My programmer gut thinks we could merge all entries that have the same FST analysis, but that means we would end up with entries (like âhkosiw
) with many senses. This will require more work and linguist decisions, so I'm stopping merging where it is and purposedfully keeping all those entries for âhkosiw
unless there is an urgent need against it.
This would seem to concern entries for which the heads are structurally the same, but the meaning so different that they are no longer senses under one entry but multiple entries, both types of meanings which are apparent in the above, and distinctions that we'd probably want to keep.
This would seem to be a case where the linguist has to manually select which of the two (or more entries) in CW and entry in MD or AECD matches to (if at all). So, we'd need to have a (new?) field for making that selection, and we'd want to do that in a way that is future-proof, to the extent possible (when AEW may modify something).
Originally posted by @fbanados in https://github.com/UAlbertaALTLab/crk-db/issues/117#issuecomment-2249058056
This presents another issue, namely that AECD atim is not merged with atim in CW and MD, cf.
The article a in AECD should be ignored as a content work. But does this have something to do with the inflectional class not entirely matching, i.e.
NA
vs.NA-3
?