globalwordnet / english-wordnet

The Open English WordNet
https://en-word.net/
Other
475 stars 57 forks source link

How to confirm sense distinctions #243

Open jmccrae opened 4 years ago

jmccrae commented 4 years ago

There are many subtle sense distinctions in the WordNet that could either represent sense distinctions not routinely made by English speakers, especially in the case of systematic polysemy or metonymy, where an object is referred to by a related term.

This issue is to capture ideas about how we can make a principled distinction here.

I have two suggestions:

  1. Collocations if the collocations are distinct then we can make a claim that there are distinct senses. This is something that could be measured using sense tagged corpora.
  2. Other dictionaries If we could establish a set of relevant other dictionaries that we trust, we could simply examine all of these and follow the majority view of other lexicographers.

Any other suggestions?

arademaker commented 4 years ago

Can you give an example for (1)?

For corpus tagging, I believe we do need to have special attention in the improvement of the glosses. making hard senses appear in the glosses can give opportunity for future annotation of these glosses.

arademaker commented 4 years ago

I believe we will also have to discuss if we want to keep all PWN original motivations and linguistics decisions or if we are willing to adopt different strategies.

For instance, some of the systematic polysemy are expected and accepted as a consequence of the PWN structure in the original 5 papers. On the other hand, we now have experience from other wordnets such as the German and Polish. Maybe other relations and models are possible. German do not follow the cluster model for adjectives for example.

jmccrae commented 4 years ago

For (1), a simple example would be that one sense of "bank" may collocate with "river", "stream", while another sense may collocate with another may collocate with "merchant", "statement", "account". You can then detect two distinct clusters using metrics such as PMI.

I don't think we should fully diverge from PWN unless we have strong evidence that how PWN is performing it is poor (e.g., "satellite adjectives" are not a category that mixes well with the literature) or PWN doesn't have a fixed principal to follow (e.g., which I think is the case for systematic polysemy).

dcillessen commented 4 years ago

Could we look at other WN projects for instances of polysemy that may have migrated to English WN? Perhaps we could also find relevant information using translation software, or dictionaries geared towards describing English as a foreign language.

rwingerter55 commented 4 years ago

[Off topic] Learning from other wordnets does not necessarily mean we have to diverge from PWN. EuroWordNet's top ontology is an enhancement to the PWN semantic fields and is fully compatible with it.

rwingerter55 commented 4 years ago

@jmccrae, in Issue #445 I followed your suggestion to consult dictionaries. It worked well. This way I could identify a sense of "event" that was present in most dictionaries but not in EWN.

arademaker commented 3 years ago

Hi @rwingerter55 , the problem is that dictionaries can differ. What dictionary will have priority? If we adopt the majority approach, we need a fixed list of dictionaries? Will we need to define which makes a dictionary a valid source? I am just thinking about how hard it can be to adopt this criterion in a large.

jmccrae commented 3 years ago

I am writing a paper on this issue... so there may be some more concrete procedures for the project here

jmccrae commented 1 year ago

Note the paper I refer to was published here: https://www.frontiersin.org/articles/10.3389/frai.2022.745626/full

Not sure it solves the issues above though in the end (see next message)

jmccrae commented 1 year ago

I have a proposal for making sense distinctions here: https://github.com/globalwordnet/english-wordnet/blob/issue-243/SYNSET_MERGING.md

Merging and creating new synsets

This document describes procedures in Open English WordNet for merging synsets and for deducing if there is a need to create a new synset, for a new sense of a word.

Synsets that share a lemma

In the case that we are considering merging two synsets that share a lemma or for the case of introducing a novel synset, the principle method of inferring if there is a novel synset is based on graph positions. The graph position is defined by the characteristic links of the synset, which are as follows

Two synsets with different positions in the graph should not be merged. For example, similar definitions but clearly distinct hypernyms would not be merged.

An example of a merge based on these properties is given by Issue #911

If it is decided that no merge is necessary, we should normally update definitions or the characteristic links to make the sense distinction clearer.

Synsets that don't share a lemma

In the case that the synsets don't share a lemma, we are also claiming that there is synonymy between all the words of the synset. The steps we take to verify this are as follows

  1. Verify that the synsets would have the same characteristic links (see above)
  2. Collect at least 3 examples for each of these synsets. This can be done by using the CoCA corpus and finding the first 3 matching examples that fit with this sense
  3. Check that all lemmas can be substituted in all cases without substantial change in meaning

For example Issue #750

An example of 'self-serving' was found in the corpus

the self-serving and greedy Daffy Duck

We substitute with the candidate merge lemma:

the selfish and greedy Daffy Duck

This does not seem to substantially change the meaning so we merged these synsets