TALP-UPC / FreeLing

FreeLing project source code
Other
252 stars 96 forks source link

wn30.src impact on WSD #105

Closed alexandretessarollo closed 4 years ago

alexandretessarollo commented 4 years ago

We (me and @arademaker ) are currently working on a WordNet expansion and we are trying to measure the impact of such expansion by using Freeling with files pre and post WN alterations. With that in mind, I'd like to know: what (if any) is the impact of the wn30.src file on word sense disambiguation?

lluisp commented 4 years ago

The algorithm used for WSD is UKB, which basically performs a page rank on WN graph to find which senses for the words in the text are closer.

If you check the user manual (https://freeling-user-manual.readthedocs.io/en/v4.2/modules/wsd/) you'll see that the configuration file for UKB (share/freeling/pt/ukb.dat) specifies where the relations forming the desired graph are found (xwn.dat in the default configuration).

wn30.dat contains information about the synsets (hypernym, semantic file, sumo, top-ontology, ...), but not the full WN graph.

The graph is described in xwn.dat, which contains the relations from eXtended WordNet (https://en.wikipedia.org/wiki/EXtended_WordNet), thus it already includes the hypernym relations listed in wn30.dat

If you want UKB to use a different set of relations, you'll need to change the RelationFile section in the ukb.dat configuration file, and provide a file in the same format than xwn.dat (eg. you can use only hypernym relations from wn30.dat, or any other combination)

Just adding senses but no relations will have no effect in UKB results.

alexandretessarollo commented 4 years ago

Thanks for the prompt answer. The manual for Semantic Database Module (https://freeling-user-manual.readthedocs.io/en/v4.2/modules/semdb/#wordnet-file) states that this module serves the word sense disambiguator and that wn30.src is a part of this module, hence my original doubt.

However I still have a question: if UKB relies only on xwn.dat (or wn.dat) and the senses module which in turn relies on sense dictionary (senses30.src) and PoS tagging rules from Semantic Database Module, then which module uses the information in wn30.src? Or is it just reference material for when the user chooses to use only hypernym relations, for instance?

lluisp commented 4 years ago

Maybe the documentation refers to some previous version of the desambiguator... I'd say that now the semantic database is used by the "senses" module, that retrieves all possible senses for a word. Later, the WSD choses the right one.

The semantic database module is kind of a simplified wn interface. It allows a module to retrieve the synsets for a given word, and to navigate WN (e.g. to find whther a synset is human, animal, or object; or whether a synset is descendant of another...)