Closed FredericBlum closed 3 months ago
This is not specific to only one cognate set, but rather a problem for all of the large ones. Another workaround would probably be to make those cases language specific, so they get all filtered out in any analysis. But then I lose important information in other cases
You use the partial colexifications editor, right? I am close to dumping it, since by now, I think, partial colexifications should be handled along with morphemes at the same time. What I also suspect is that you have numerous duplicates from the same language here, right? So you have the same root all again in one language. My general goal for the future of these workflows is to find ways in which we do only one representative alignment, since correspondence patterns are anyway only build on one alignment, and list the rest of the words in the same language as part of a word family. This would reduce the size of cognate sets in your case.
Maybe, given that we consider meeting anyway in Passau for some Semesterabschluss / Hackaton, we could say that we take the topic "Scaling problems in EDICTOR and possible solutions" as one that we discuss there?
Yes, that was form within the "Edit partial cognate sets" tab. It is my go-to tab for going through the alignments.
It is indeed the case that there are numerous duplicates from the same language. Mainly short verbal stems that are not separated from the root. Resolving those to one single representative case would solve this completely.
I guess we should schedule a meeting on EDICTOR and best practices and future desiderata. I think the way to proceed here is to add one more cognate set, that you could call "verticalids", where you indicate language-internal cognates, and must make sure that language-internal forms are always identical (or inline-aligned) to account for proper word families. Then you use cogids for horizontal (across languages) comparison. Potential risk: you MAY miss interesting cross-semantic cognates, but then, you'd have the right to retain some exemplary forms, and would use this rather to get rid of suffix-thingies, that make the alignments difficult, but use COGIDS for roots, that is, lexemes with meanings. I would consider doing this on the tibetic data I am working with, whihc is also notoriously difficult to code in this regard...
EDICTOR should offer the possibility to delete duplicates or mark them as uneditable, allowing both to preserve information while at the same time ignoring them for a given alignment (and correspondence pattern).
I found the reason now. It was due to the long URLs in GET requests.
With POST, this does not happen, edicto3 will circumvent this also by using POST requests in most cases and using the local host.
For very large cognate sets (>30, unsure about the exact number), I cannot store the alignments and receive an error message instead. Right now I am editing those cases manually, but that becomes quite tedious considering the size of the cognate set. An example screenshot of the error message is attached to this post.