alvations / gown

0 stars 0 forks source link

Lets try to define some things and help with the implementation #1

Open alvations opened 4 years ago

alvations commented 4 years ago

What is a Synset?

What is a Lemma?

How are Synsets connected?

goodmami commented 4 years ago

Regarding synsets:

  • A generic Python object? Or a Graph Node? Which is a better representation?

I'm excited by the possibilities that constructing a proper graph would enable, but tempered by the thought that it would be rather slow to construct the graph on startup, and that it would consume a lot of memory. Since there are tens (hundreds?) of thousands of lemmas and synsets created, I might opt for classes with __slots__ defined or maybe namedtuples, as these have a small footprint. Even better (from a performance standpoint) would be to keep the data raw (e.g., simple tuples) until the user requests it.

I'm all for some function to construct a proper graph if the user wants it. But since the networkx dependency is rather heavy, maybe it could be an extra.

goodmami commented 4 years ago

Regarding lemmas, in the LMF a LexicalEntry is similar to the old wordnet module's concept of "Lemma", but the Synset-Lemma relationship is many-to-many. That is, a lexical entry can have multiple senses linking it to different synsets, just as a synset can be linked to multiple lemmas. If we make a Lemma class that contains the information of LexicalEntry, then the Lemma.synset() method (as in the old wordnet) doesn't make as much sense. We could enumerate multiple Lemmas from each sense on the LexicalEntry, but this duplicates a lot of information.

fcbond commented 4 years ago

I think the standard(ish) terminology is: lemma - roughly word sense - lemma-synset pair (called lexical unit by Maciej) synset -roughly concept

the LMF also introduces form (variant forms of the lemma, one of which is canonical).

On Thu, Jun 11, 2020 at 12:33 PM Michael Wayne Goodman < notifications@github.com> wrote:

Regarding lemmas, in the LMF a LexicalEntry is similar to the old wordnet module's concept of "Lemma", but the Synset-Lemma relationship is many-to-many. That is, a lexical entry can have multiple senses linking it to different synsets, just as a synset can be linked to multiple lemmas. If we make a Lemma class that contains the information of LexicalEntry, then the Lemma.synset() method (as in the old wordnet) doesn't make as much sense. We could enumerate multiple Lemmas from each sense on the LexicalEntry, but this duplicates a lot of information.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alvations/gown/issues/1#issuecomment-642400155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRW2VUMXNODHY5AK5N3RWBNCNANCNFSM4NYCXQHA .

-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University

goodmami commented 4 years ago

Good, thanks! I was trying to describe the dissonance between the LMF model and the API in the NLTK's wordnet module, where Lemma contains a lot of the info from LexicalEntry, such as sense relations, as well as a link back to a single Synset.

For this project, it would be nice to follow the official structure better, but I'm afraid it will make the API too cumbersome. I welcome any solutions for this (see #6).