Closed vsraptor closed 3 years ago
To be honest I haven't thought much about similarity metrics beyond how to implement them for this library, so I don't think I can be of much help. It sounds like you want to induce CFGs over natural language text to try and model the distributional-semantics of words. Wordnets, however, are not distributional models, but (more-or-less) curated lexical-semantic models. For distributional models the neural approaches (word2vec, LSTM, transformers) have enjoyed much success lately, although they have steep training costs. For what you propose, I'm not quite seeing how it would work, but you've probably thought about it more than me. Good luck!
This issue is not about a bug or missing feature for Wn so I will close it. I suggest bringing your idea to a broader discussion forum for more feedback.
I have this idea for structural similarity which is tangentially related to WN
The similarity calculations using WN depend on hierarchy.
What if you parse a text corpus using the Sequitur algorithm (this is a compression algorithm which generates the compressed string and a Grammar). Then use the generated Grammar as a hierarchy to apply sim measures such as wu-pal to compare sentences ! TheGrammar can capture statistics of usage and the ordinal structure of the sentences /not lexical, i think/.
Do you think such an approach make sense OR i'm talking nonsense ?