Open aarppe opened 2 years ago
@dwhieb Here's a batch of example cases for testing the multi-source aggregation.
Reopening as these are good examples where current multi-source aggregation may not be currently doing the right thing.
We might want to implement some form of standardization of the ALTLab versions of the definitions from MD and AECD. For instance, removing the initial article, i.e. 'a, an, the', and lower-casing initial pronouns (e.g. S/he
-> s/he
in AECD , perhaps even generalizing the masculine He
in MD to s/he
as in CW. We might do an initial pass of this programmatically, and always keeping the original definition for reference, but then having a standardized version for public consumption in itwêwina. This would merit an issue of its own (#121).
Here's a list of selected CW, AE, and MD dictionary entries, and interpretations of how they should be split to senses (based on the semicolon for CW) and how these senses should be aggregated using the hierarchical procedure with CW > AE > MD. What turned out in this only limited scrutiny is that the AE entries are surprising aberrant (wrong part-of-speech, apparent duplicates), which might require more manual fixing than I'd want.