goodmami / wn

A modern, interlingual wordnet interface for Python
https://wn.readthedocs.io/
MIT License
222 stars 23 forks source link

Tracing back 'inferred' synsets to their reference lexicons #167

Open francis-dion opened 2 years ago

francis-dion commented 2 years ago

When looking at relations for the omw-fr-00619230-n synset, I saw four INFERRED synsets for the hyponym relation and two for the has_domain_topic one.

The oewn and omw-en lexicons both return two hyponyms and one has_domain_topic synsets for the translation of omw-fr-00619230-n.

print(wn.synset('omw-fr-00619230-n').relations()) #{'omw-fr-00618734-n': ##[Synset('omw-fr-00618734-n')], ##'hyponym': [Synset('*INFERRED*'), Synset('*INFERRED*'), Synset('*INFERRED*'), Synset('*INFERRED*')], ##'has_domain_topic': [Synset('*INFERRED*'), Synset('*INFERRED*')]}

print(wn.synset('omw-fr-00619230-n').translate('oewn')[0].relations()) #{'hypernym': [Synset('oewn-00619974-n')], ##'hyponym': [Synset('oewn-00620659-n'), Synset('oewn-00620818-n')], ##'has_domain_topic': [Synset('oewn-06506364-n')]}

My understanding is that the relations() function detects the relations in the two English lexicons I have loaded. It seems that, at the moment, these are the only two lexicons I am working with which are providing "extra relations". Since I'm hoping to work with (or eventually create/support) other lexicons with their own relations, I envision a challenge in retrieving the source lexicon for any given inferred synset.

My question is thus, is there currently (or planned) a mean to retrieve the source lexicon of an inferred synset?

Thanks!

goodmami commented 2 years ago

First of all, the functions of the wn module are provided only as a convenience. For any real work I strongly suggest creating a wn.Wordnet object so you can be clear where the data are coming from:

>>> import wn
>>> fr = wn.Wordnet('omw-fr', expand='omw-en')
>>> fr.synset('omw-fr-00619230-n').relations()
{'omw-fr-00618734-n': [Synset('omw-fr-00618734-n')], 'hyponym': [Synset('*INFERRED*'), Synset('*INFERRED*')], 'has_domain_topic': [Synset('*INFERRED*')]}

If you just use the wn module functions for your queries, it queries all installed lexicons, which may yield surprising or repetitive results.

It's not a bad idea to somehow retain the lexicon whence a synset was inferred, though.

francis-dion commented 2 years ago

Thanks for the Wordnet class tip. I had seen it in the docs but it somehow didn't register in my mind. Sorry for yet another newbie question, but are the relation types constrained or somehow "standardized" by OMW, the GWA or others? If not, I'm tempted to explore "masking" relationships, whereby a lexicon can block some specific relations provided by an expand lexicon.

goodmami commented 2 years ago

are the relation types constrained or somehow "standardized" by OMW, the GWA or others?

See https://github.com/globalwordnet/schemas/

I'm tempted to explore "masking" relationships, whereby a lexicon can block some specific relations provided by an expand lexicon.

This might be implemented as an element rather than a relation type, because the relation type + target can help it select the thing to be masked (relations don't have unique IDs). E.g.:

<ExternalSynset id="...">
  <SynsetRelationMask relType="hyponym" target="..." />
</ExternalSynset>

A problem with this is that if you want to mask a certain relation type between synsets A and B and then want to create a new relationship of the same type between A and B, we'd have to be careful to apply those extensions in the proper order, otherwise the mask might block the new relation, too.

If you have a proposal for how to do this, create an issue at https://github.com/globalwordnet/schemas/ so it can be tracked and discussed.