clics / pyclics

python package implementing the CLICS processing workflow
Apache License 2.0
3 stars 0 forks source link

Checks for concepts with Concepticon_ID = 0 #6

Closed chrzyki closed 5 years ago

chrzyki commented 5 years ago

So, I don't even know whether this PR is noteworthy and should be merged at all, given the extreme marginality of an edge-case like this. Still, I'd be curious to learn where we could catch something like that? I suppose this should have been caught/not done immediately during the stage of the CLDF generation?

LinguList commented 5 years ago

We have some risk of concept mapping in cldf, given that we do not check for the same things in cldf which we currently check for concepticon, e.g., if an ID is valid, i.e., not already replaced by a synonym or an alias. We can always argue that the ideal way would anyway be to link links to concepticon explicitly, by adding them to concepticon, and having the same test as for concepticon would be tedious, but we may not want to forget this, that we could easily introduce errors in this stage. I know it's not exactly what your PR does, but it's something that came to my mind when I saw it.

chrzyki commented 5 years ago

Good points! Better integrating this with the checks we've already available in pyconcepticon might be very worthwhile.

xrotwang commented 5 years ago

I think we could merge this, but should definitely file an issue with pylexibank to avoid this. If this happens for "non-pylexibank" datasets like lexirumah, I'd propose fixing it by expanding the NULL specification for the Concepticon_ID column to "NULL": ["", "0"] - which should let the load command do the right thing.

chrzyki commented 5 years ago

Thanks! I've opened the issue - I also like the ["", "0"] idea and I think having both things in mind is a good thing.