goodmami / wn

A modern, interlingual wordnet interface for Python
https://wn.readthedocs.io/
MIT License
197 stars 19 forks source link

Missing Spanish definitions #159

Closed noe closed 2 years ago

noe commented 2 years ago

OMW does not provide the definitions and examples present in the Spanish data from MCR 3.0. This is acknowledged by them on their website:

We are focused on adding lemmas, we do not have all extra information from other projects such as:

  • Definitions and examples from wordnets such as Spanish ...

It would be great for wn to be able to access the available Spanish definitions, either from the current omw-es:1.4 wordnet or via a new lexicon created directly from the original MCR data

In any case, adding a warning in the omw-es:1.4 entry of the wordnets table in the README file indicating this problem would at least let people know about the situation.

Update: In #151 they include some XML fragments of Spanish definitions, specifically, the definition of "angiodisplasia" is shown there, and it matches that in the original MCR data. This confuses me because OMW clearly states that it offers no Spanish definitions and my own tests confirm so (I actually bumped into this situation while trying to get definitions), including using wn to retrieve Spanish definitions and even querying the internal wn SQLite database. Are there Spanish definitions somewhere? This is very confusing, because wn.Wordnet('omw-es:1.4').synsets('angiodisplasia') gives an empty list. Querying for other words gives me synsets (e.g. wn.Wordnet('omw-es:1.4').synsets('perro') returns [Synset('omw-es-02084071-n'), Synset('omw-es-10539715-n')]), but they have no definitions (i.e. wn.Wordnet('omw-es:1.4').synsets('perro')[0].definition() is empty) Someone can clarify? @fcbond are you integrating the original MCR Spanish wordnet into wn?

goodmami commented 2 years ago

Thanks for the report. I agree it would be great to have the definitions in the Spanish wordnet. I think the discussion you find in #151 is about an upcoming version of the Spanish wordnet (@fcbond called it an "MCR wordnet candidate"), and not the current version packaged by OMW (omw-es:1.4). @fcbond, can you confirm?

Your request would be better at https://github.com/omwn/omw-data, which is the repository for the OMW data. Wn is only the software library for working with the wordnet data and does not prepare or package any wordnet data itself.

noe commented 2 years ago

Thanks, I have raised the issue at the OMW data repository: https://github.com/omwn/omw-data/issues/25

goodmami commented 2 years ago

Thanks! I'll close this one, then.