Closed chrzyki closed 5 years ago
We have some risk of concept mapping in cldf, given that we do not check for the same things in cldf which we currently check for concepticon, e.g., if an ID is valid, i.e., not already replaced by a synonym or an alias. We can always argue that the ideal way would anyway be to link links to concepticon explicitly, by adding them to concepticon, and having the same test as for concepticon would be tedious, but we may not want to forget this, that we could easily introduce errors in this stage. I know it's not exactly what your PR does, but it's something that came to my mind when I saw it.
Good points! Better integrating this with the checks we've already available in pyconcepticon
might be very worthwhile.
I think we could merge this, but should definitely file an issue with pylexibank
to avoid this. If this happens for "non-pylexibank" datasets like lexirumah, I'd propose fixing it by expanding the NULL
specification for the Concepticon_ID
column to "NULL": ["", "0"]
- which should let the load command do the right thing.
Thanks! I've opened the issue - I also like the ["", "0"]
idea and I think having both things in mind is a good thing.
So, I don't even know whether this PR is noteworthy and should be merged at all, given the extreme marginality of an edge-case like this. Still, I'd be curious to learn where we could catch something like that? I suppose this should have been caught/not done immediately during the stage of the CLDF generation?