Living-with-machines / TargetedSenseDisambiguation

Repository for the work on Targeted Sense Disambiguation
MIT License
1 stars 0 forks source link

prepare Huang et al. (2019) #74

Open BarbaraMcG opened 3 years ago

BarbaraMcG commented 3 years ago

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge Luyao Huang, Chi Sun, Xipeng Qiu, Xuanjing Huang

https://www.aclweb.org/anthology/D19-1355.pdf

  1. What is this paper about?
  2. Is it relevant to our project? If so, why and how?
  3. What could we use from this work in our project?
  4. Add some text about it to Overleaf
  5. Plan experiments (if appropriate)
mcollardanuy commented 3 years ago

What is the paper about?

The authors propose a new neural-based approach to Word Sense Disambiguation (WSD) that leverages gloss (i.e. definition of a sense) information from WordNet. They treat WSD as a sentence-pair classification problem, using BERT.

They build context-gloss sentence pairs, in the input format required by BERT, where the first sentence is the context (i.e. the sentence where the target word occurs) and the second sentence is the gloss of a specific WordNet sense for a given lemma (in the following example, the lemma is "long" and the gloss is "desire strongly or persistently"):

[CLS] How long has it been since you reviewed the objectives of your benefit and service program ? [SEP] desire strongly or persistently [SEP]

They train three different BERT models (see below). For each target word in a context, there are as many context-gloss pair training instances as the number of possible glosses of this target word, which will be either positive matches or negative (in the example, the label would be negative). For testing, they output the probability of label and choose the sense with the highest probability.

They experiment with three BERT models:

The authors note that adding this weak supervision yields the best performance, probably because it combines the advantages of the other two methods.

Is it relevant to our project? If so, why and how? What could we use from this work in our project?

It could be a baseline, or even we could create our final method out of this, if we could add time as a feature, to make it diachronic. The code seems clean and well-documented (https://github.com/HSLCY/GlossBERT). It is a quite clean approach, simple and intuitive. They say that "it is quite expensive to train the model", but don't quantify the claim.

Add some text about it to Overleaf

Plan experiments (if appropriate)