goodmami / wn

A modern, interlingual wordnet interface for Python
https://wn.readthedocs.io/
MIT License
199 stars 19 forks source link

Using Grammar to meausure similarity #121

Closed vsraptor closed 3 years ago

vsraptor commented 3 years ago

I have this idea for structural similarity which is tangentially related to WN

The similarity calculations using WN depend on hierarchy.

What if you parse a text corpus using the Sequitur algorithm (this is a compression algorithm which generates the compressed string and a Grammar). Then use the generated Grammar as a hierarchy to apply sim measures such as wu-pal to compare sentences ! TheGrammar can capture statistics of usage and the ordinal structure of the sentences /not lexical, i think/.

Do you think such an approach make sense OR i'm talking nonsense ?

goodmami commented 3 years ago

To be honest I haven't thought much about similarity metrics beyond how to implement them for this library, so I don't think I can be of much help. It sounds like you want to induce CFGs over natural language text to try and model the distributional-semantics of words. Wordnets, however, are not distributional models, but (more-or-less) curated lexical-semantic models. For distributional models the neural approaches (word2vec, LSTM, transformers) have enjoyed much success lately, although they have steep training costs. For what you propose, I'm not quite seeing how it would work, but you've probably thought about it more than me. Good luck!

This issue is not about a bug or missing feature for Wn so I will close it. I suggest bringing your idea to a broader discussion forum for more feedback.