goodmami / wn

A modern, interlingual wordnet interface for Python
https://wn.readthedocs.io/
MIT License
221 stars 23 forks source link

Validation of LMF IDs when adding lexicons #101

Open goodmami opened 3 years ago

goodmami commented 3 years ago

SQL errors can be hard to understand, especially for a user of Wn. To avoid these, some validation of the LMF files should be performed, such as ensuring that all IDs referenced are provided by the document (including as external elements in extension lexicons).

There is already some validation, e.g., of allowed part of speech values, relation types, etc, but so far nothing regarding entity linking. Things like cycle detection are probably too expensive to do during add, though.

goodmami commented 3 years ago

Validation is now handled with wn.validate, so I'm pulling this off the v0.9.0 milestone to avoid holding it up. I think we need to to re-evaluate what to do here. One option: catch SQL errors on adding a lexicon and suggest to the user to validate it first. For this to be effective, I think we need an easy way to load an LMF lexicon from a file stored in the cache so it could be passed to wn.validate.validate().