Reading: Specialising Word Vectors for Lexical Entailment

0. Paper

@inproceedings{vulic-mrksic-2018-specialising, title = "Specialising Word Vectors for Lexical Entailment", author = "Vuli{\'c}, Ivan and Mrk{\v{s}}i{\'c}, Nikola", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-1103", doi = "10.18653/v1/N18-1103", pages = "1134--1145", abstract = "We present LEAR (Lexical Entailment Attract-Repel), a novel post-processing method that transforms any input word vector space to emphasise the asymmetric relation of lexical entailment (LE), also known as the IS-A or hyponymy-hypernymy relation. By injecting external linguistic constraints (e.g., WordNet links) into the initial vector space, the LE specialisation procedure brings true hyponymy-hypernymy pairs closer together in the transformed Euclidean space. The proposed asymmetric distance measure adjusts the norms of word vectors to reflect the actual WordNet-style hierarchy of concepts. Simultaneously, a joint objective enforces semantic similarity using the symmetric cosine distance, yielding a vector space specialised for both lexical relations at once. LEAR specialisation achieves state-of-the-art performance in the tasks of hypernymy directionality, hypernymy detection, and graded lexical entailment, demonstrating the effectiveness and robustness of the proposed asymmetric specialisation model.", }

1. What is it?

They propose a new postprocessing approach to consider hypernymy.

2. What is amazing compared to previous works?

synonymy (#210) and antonymy (#211): symmetric relations
hypernymy: asymmetric relation

3. Where is the key to technologies and techniques?

3.1 Model

They represent:

symmetric relations: angle
asymmetric relations: norm

In training, they prepare positive/negative samples x (in B) and t (in T).

synonym
antonym
vector preservation
hypernym: high- low- level distance j is given

Total loss is defined:

hypernym has two functions Att() and LE(): word pairs that are in a high-level/low-level relationship should have a high degree of similarity

3.2 Metrics

Previous methods use cosine similarity (angle), but this model considers angle and norm. In this paper, they propose a new metric considering both of them.

スクリーンショット 2021-09-29 3 13 59

4. How did evaluate it?

Hypernym tasks

Nguyen2017: ad-hoc based approach
their method achieves state-of-the-art performance

a1da4 / paper-survey