Open BarbaraMcG opened 3 years ago
The authors propose a new neural-based approach to Word Sense Disambiguation (WSD) that leverages gloss (i.e. definition of a sense) information from WordNet. They treat WSD as a sentence-pair classification problem, using BERT.
They build context-gloss sentence pairs, in the input format required by BERT, where the first sentence is the context (i.e. the sentence where the target word occurs) and the second sentence is the gloss of a specific WordNet sense for a given lemma (in the following example, the lemma is "long" and the gloss is "desire strongly or persistently"):
[CLS]
How long has it been since you reviewed the objectives of your benefit and service program ?[SEP]
desire strongly or persistently[SEP]
They train three different BERT models (see below). For each target word in a context, there are as many context-gloss pair training instances as the number of possible glosses of this target word, which will be either positive matches or negative (in the example, the label would be negative). For testing, they output the probability of label
and choose the sense with the highest probability.
They experiment with three BERT models:
GlossBERT (Sent-CLS-WS): the same as (Sent-CLS), but the target word is signalled both in the context sentence (through quotation marks) and the gloss sentence (as a prefix):
[CLS]
How " long " has it been since you reviewed the objectives of your benefit and service program ?[SEP]
long: a subsequent examination of a patient for the purpose of monitoring earlier treatment[SEP]
The authors call this adding weak supervision to the context-gloss pairs.
The authors note that adding this weak supervision yields the best performance, probably because it combines the advantages of the other two methods.
It could be a baseline, or even we could create our final method out of this, if we could add time as a feature, to make it diachronic. The code seems clean and well-documented (https://github.com/HSLCY/GlossBERT). It is a quite clean approach, simple and intuitive. They say that "it is quite expensive to train the model", but don't quantify the claim.
GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge Luyao Huang, Chi Sun, Xipeng Qiu, Xuanjing Huang
https://www.aclweb.org/anthology/D19-1355.pdf