getalp / disambiguate

Disambiguate is a tool for training and using state of the art neural WSD models
https://arxiv.org/abs/1905.05677
MIT License
58 stars 17 forks source link

Use modified version of WordNet #7

Open mrmechko opened 4 years ago

mrmechko commented 4 years ago

If I wanted to use my own modified version of WordNet, or perhaps a different hierarchy with this system, where would I start?

I notice that the java code uses JWI, but I'm trying to figure out if the core system actually needs the full wordnet hierarchy or just the tags.

loic-vial commented 4 years ago

Hi !

So, it's true that we currently rely a lot on the WordNet hierarchy. If you want to use another sense inventory, here are some tips:

I know that it would be great to have a clear interface, to use any sense inventory, and it's not too difficult, but I don't have the time to do the changes right now, however it's planned for 2020 (after I finish my PhD actually ^^). If you want to work on it, I would be glad to take pull requests :) I think the best way to achieve this would be to replace all "WordNetStuff" by a generic "SenseInventoryStuff", so the code stays globally the same, and we will then provide different implementation of the SenseInventory.

mrmechko commented 4 years ago

Hi, fellow PhD student here, hoping to finish in 2020 too.

I decided to sidestep the issue for now by using sense compression. The TRIPS ontology has mappings from WordNet, so I'm just replacing the hypernym compression algorithm with TRIPS compression. That does violate the invariant that was described in the paper (that no compression should result in losing a unique wordsense) but it seems to be working pretty well.