Open jmccrae opened 4 years ago
Can you give an example for (1)?
For corpus tagging, I believe we do need to have special attention in the improvement of the glosses. making hard senses appear in the glosses can give opportunity for future annotation of these glosses.
I believe we will also have to discuss if we want to keep all PWN original motivations and linguistics decisions or if we are willing to adopt different strategies.
For instance, some of the systematic polysemy are expected and accepted as a consequence of the PWN structure in the original 5 papers. On the other hand, we now have experience from other wordnets such as the German and Polish. Maybe other relations and models are possible. German do not follow the cluster model for adjectives for example.
For (1), a simple example would be that one sense of "bank" may collocate with "river", "stream", while another sense may collocate with another may collocate with "merchant", "statement", "account". You can then detect two distinct clusters using metrics such as PMI.
I don't think we should fully diverge from PWN unless we have strong evidence that how PWN is performing it is poor (e.g., "satellite adjectives" are not a category that mixes well with the literature) or PWN doesn't have a fixed principal to follow (e.g., which I think is the case for systematic polysemy).
Could we look at other WN projects for instances of polysemy that may have migrated to English WN? Perhaps we could also find relevant information using translation software, or dictionaries geared towards describing English as a foreign language.
[Off topic] Learning from other wordnets does not necessarily mean we have to diverge from PWN. EuroWordNet's top ontology is an enhancement to the PWN semantic fields and is fully compatible with it.
@jmccrae, in Issue #445 I followed your suggestion to consult dictionaries. It worked well. This way I could identify a sense of "event" that was present in most dictionaries but not in EWN.
Hi @rwingerter55 , the problem is that dictionaries can differ. What dictionary will have priority? If we adopt the majority approach, we need a fixed list of dictionaries? Will we need to define which makes a dictionary a valid source? I am just thinking about how hard it can be to adopt this criterion in a large.
I am writing a paper on this issue... so there may be some more concrete procedures for the project here
Note the paper I refer to was published here: https://www.frontiersin.org/articles/10.3389/frai.2022.745626/full
Not sure it solves the issues above though in the end (see next message)
I have a proposal for making sense distinctions here: https://github.com/globalwordnet/english-wordnet/blob/issue-243/SYNSET_MERGING.md
This document describes procedures in Open English WordNet for merging synsets and for deducing if there is a need to create a new synset, for a new sense of a word.
In the case that we are considering merging two synsets that share a lemma or for the case of introducing a novel synset, the principle method of inferring if there is a novel synset is based on graph positions. The graph position is defined by the characteristic links of the synset, which are as follows
Two synsets with different positions in the graph should not be merged. For example, similar definitions but clearly distinct hypernyms would not be merged.
An example of a merge based on these properties is given by Issue #911
If it is decided that no merge is necessary, we should normally update definitions or the characteristic links to make the sense distinction clearer.
In the case that the synsets don't share a lemma, we are also claiming that there is synonymy between all the words of the synset. The steps we take to verify this are as follows
For example Issue #750
An example of 'self-serving' was found in the corpus
the self-serving and greedy Daffy Duck
We substitute with the candidate merge lemma:
the selfish and greedy Daffy Duck
This does not seem to substantially change the meaning so we merged these synsets
There are many subtle sense distinctions in the WordNet that could either represent sense distinctions not routinely made by English speakers, especially in the case of systematic polysemy or metonymy, where an object is referred to by a related term.
This issue is to capture ideas about how we can make a principled distinction here.
I have two suggestions:
Any other suggestions?