dkpro / dkpro-uby

Framework for creating and accessing UBY resources – sense-linked lexical resources in standard UBY-LMF format
https://dkpro.github.io/dkpro-uby
Other
22 stars 3 forks source link

Better way to access WordNet synonyms? #98

Open judithek opened 9 years ago

judithek commented 9 years ago
An often repeated question of new users or during teaching courses is: I queried synonyms
for a WordNet sense, but end up with an empty list. The reason is that WordNet does
not make use of SenseRelations, but uses the WordNet-inherent synset structure to define
its synonyms. That is, one needs to get the synset of a sense and iterate over all
senses of this synset (excluding the source) to compile the list of synonyms.

Does anyone has a better idea of doing that?
* Educate the users?
* Duplicate information?
* Convenience methods?
* Smart API methods?
* ...

Original issue reported on code.google.com by chmeyer.de on 2014-07-17 15:48:41

judithek commented 9 years ago
what is the difference between Convenience methods and Smart API methods?

I agree that we should offer either of these.

in the meantime: educate the users

Original issue reported on code.google.com by eckle.kohler on 2014-07-18 07:14:09

judithek commented 9 years ago
Valid question :)

Here's what I had in mind:
* Convenience method: a method wrapping the access to a Synset's synonyms given a sense
(excluding the identity) - such a method basically saves the time to code this issue
over and over again
* Smart API: a method collecting synonyms from multiple sources (i.e., checking the
synset's senses and the sense relations). Thus, the method is semi-intelligent in combining
multiple information sources (here: database tables). If we decide to go in this direction,
similar issues also apply to other information types (e.g., synset vs. sense relations,
synset vs. sense definition,...) The problem with such methods is that they are rather
unintuitive on a technical side (i.e., they do multiple things and thus need more time
to understand them).

Another possibility is to duplicate information in the database, but I think that we
voted against that in the past.

Original issue reported on code.google.com by chmeyer.de on 2014-07-18 08:08:55

judithek commented 9 years ago
Same issue appears with definitions of sense and synset. In WordNet, the sense's definition
is null, in Wiktionary the synset's. We already have a convenience method getDefinitionText()
which could be made smart or distinguish b/w getSynsetDefinitionText and getSenseDefinitionText.

Original issue reported on code.google.com by chmeyer.de on 2014-07-23 15:34:39

judithek commented 9 years ago
Having thought about it, I dislike (semi-)intelligent or smart methods for various reasons.
If we decide to offer those methods, they should always issue a warning such as:

Calling this smart method might lead to a quality reduction of/ additional noise in
the lexical data you extract. Please make sure that your application is robust against
this kind of noise...

or sth like that.

needs to be discussed

Original issue reported on code.google.com by eckle.kohler on 2014-07-30 09:42:44

judithek commented 9 years ago
decision: we will implement convenience methods, such as
- get senses of synonyms (using Synset)
- get word forms of synonyms (using SenseRelation and Synset)
- get senses of synonymous word forms (using SenseRelation and Synset and most frequent
sense or other disambiguation heuristic)

please suggest method names

Original issue reported on code.google.com by eckle.kohler on 2014-07-31 16:01:41

judithek commented 9 years ago
getSynonymousSenses(Synset)
getSynonymousSenses(Lemma)
getSynonymousLemma(Synset, SenseRelation) - not sure here... is the sense relation
actually an input or something followed internally?

Original issue reported on code.google.com by richard.eckart on 2014-08-01 05:59:19

judithek commented 9 years ago
After thinking about our discussion, I think that what I called Smart API methods is
not even feasible b/c of the different types that the two sources for synonyms return.
The same partly applies to the suggestions Richard kindly added (#6): Synset-inherent
synonyms should return a list of Sense instances, relationally encoded ones should
return a list of SenseRelations (which facilitate accessing form and/or sense, depending
on what is encoded). Having this in mind, I have some alternative suggestions. I am
numbering them such that we can decide for each group if we want them or make alternative
suggestions for them.

(1)
** List<Sense> UBY.getSynonymsFromSynset(Sense)
** List<SenseRelation> UBY.getSynonymsFromRelations(Sense)

Both methods should clearly refer to each other in the Javadoc and explain the difference.
Note that the second method cannot return a reflexive synonym (i.e., returning a SenseRelation
that has the specified input Sense as a target) without painful coding work. For consistency
reasons, I would thus strengthen my argument that also the first method should not
include the given sense in the resulting list.

(2)
** List<Sense> UBY.getSynonymsFromSynset(Synset)
In contrast to (1), this method should return all synonymous senses of the synset (by
definition!). I am very skeptical if we need this method, since it is just a shorthand
for "synset.getSenses()".

(3)
To avoid having users fiddling around with the Sense and Synset classes on their own,
we MIGHT want to include
** List<Sense> Sense.getSynonymsFromSynset(UBY)
** List<SenseRelation> Sense.getSynonymsFromRelations(UBY)
in the Sense class. I'm ambivalent on this one: pro - it helps users to find synonyms
at all (currently most of them are lost), con - it introduces redundancy and unwanted
dependencies (Sense to UBY).

(4)
For the frequent case of extracting only the form (the lemma in this case) of the synonyms,
we could include
** List<String> UBY.getSynonymFormsFromSynset(Sense)
** List<String> UBY.getSynonymFormsFromRelations(Sense)

-or-
** List<String> UBY.getSynonymForms(Sense)
** List<String> UBY.getSynonymForms(Sense, boolean includeRelations, boolean includeSynsets)

(5)
and maybe even:
** List<String> UBY.getSynonymForms(String lemma)
** List<String> UBY.getSynonymForms(LexicalEntry)

This basically lumps all senses. Useful if no sense-disambiguation can be made. Do
we need both?

Last but not least, we need to describe our changes on http://code.google.com/p/uby/wiki/ApiTutorial
and we should change the Javadoc of the Sense.getSenseRelations() and potentially Synset.getSenses()
methods and explain that there is a second way of encoding synonyms depending on the
structural properties.

Original issue reported on code.google.com by chmeyer.de on 2014-08-01 07:53:27

judithek commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by eckle.kohler on 2014-10-09 17:26:40

judithek commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2015-02-18 21:11:15