Open alvations opened 4 years ago
Regarding synsets:
- A generic Python object? Or a Graph Node? Which is a better representation?
I'm excited by the possibilities that constructing a proper graph would enable, but tempered by the thought that it would be rather slow to construct the graph on startup, and that it would consume a lot of memory. Since there are tens (hundreds?) of thousands of lemmas and synsets created, I might opt for classes with __slots__
defined or maybe namedtuples, as these have a small footprint. Even better (from a performance standpoint) would be to keep the data raw (e.g., simple tuples) until the user requests it.
I'm all for some function to construct a proper graph if the user wants it. But since the networkx dependency is rather heavy, maybe it could be an extra.
Regarding lemmas, in the LMF a LexicalEntry is similar to the old wordnet module's concept of "Lemma", but the Synset-Lemma relationship is many-to-many. That is, a lexical entry can have multiple senses linking it to different synsets, just as a synset can be linked to multiple lemmas. If we make a Lemma class that contains the information of LexicalEntry, then the Lemma.synset()
method (as in the old wordnet) doesn't make as much sense. We could enumerate multiple Lemmas from each sense on the LexicalEntry, but this duplicates a lot of information.
I think the standard(ish) terminology is: lemma - roughly word sense - lemma-synset pair (called lexical unit by Maciej) synset -roughly concept
the LMF also introduces form (variant forms of the lemma, one of which is canonical).
On Thu, Jun 11, 2020 at 12:33 PM Michael Wayne Goodman < notifications@github.com> wrote:
Regarding lemmas, in the LMF a LexicalEntry is similar to the old wordnet module's concept of "Lemma", but the Synset-Lemma relationship is many-to-many. That is, a lexical entry can have multiple senses linking it to different synsets, just as a synset can be linked to multiple lemmas. If we make a Lemma class that contains the information of LexicalEntry, then the Lemma.synset() method (as in the old wordnet) doesn't make as much sense. We could enumerate multiple Lemmas from each sense on the LexicalEntry, but this duplicates a lot of information.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alvations/gown/issues/1#issuecomment-642400155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPZRW2VUMXNODHY5AK5N3RWBNCNANCNFSM4NYCXQHA .
-- Francis Bond http://www3.ntu.edu.sg/home/fcbond/ Division of Linguistics and Multilingual Studies Nanyang Technological University
Good, thanks! I was trying to describe the dissonance between the LMF model and the API in the NLTK's wordnet module, where Lemma contains a lot of the info from LexicalEntry, such as sense relations, as well as a link back to a single Synset.
For this project, it would be nice to follow the official structure better, but I'm afraid it will make the API too cumbersome. I welcome any solutions for this (see #6).
What is a Synset?
lemma_names
: enumerate the lemmas' formdefinition
: Plaintext definition, so:-> str
?examples
: Example sentences where synset occursoffset
: Offset IDs from ILI / CILI / EWN ??pos
: Part of speechrelations
: What kind of relations?synonym
: Connects to other Synset objects?? Related to??hypernym
: Connects to super Synset objectshyponym
: Connects to underling class objectshypernym paths
: Paths to reach the TOP/ROOT node(s) of WordNet concepts.hypernym roots
: The TOP/ROOT node(s) of the synset's hypernymsWhat is a Lemma?
How are Synsets connected?