@article{arora-etal-2020-learning,
title = "Learning Lexical Subspaces in a Distributional Vector Space",
author = "Arora, Kushal and
Chakraborty, Aishik and
Cheung, Jackie C. K.",
journal = "Transactions of the Association for Computational Linguistics",
volume = "8",
year = "2020",
url = "https://aclanthology.org/2020.tacl-1.21",
doi = "10.1162/tacl_a_00316",
pages = "311--329",
abstract = "In this paper, we propose LexSub, a novel approach towards unifying lexical and distributional semantics. We inject knowledge about lexical-semantic relations into distributional word embeddings by defining subspaces of the distributional vector space in which a lexical relation should hold. Our framework can handle symmetric attract and repel relations (e.g., synonymy and antonymy, respectively), as well as asymmetric relations (e.g., hypernymy and meronomy). In a suite of intrinsic benchmarks, we show that our model outperforms previous approaches on relatedness tasks and on hypernymy classification and detection, while being competitive on word similarity tasks. It also outperforms previous systems on extrinsic classification tasks that benefit from exploiting lexical relational cues. We perform a series of analyses to understand the behaviors of our model.1Code available at https://github.com/aishikchakraborty/LexSub.",
}
1. What is it?
They proposed a new approach to dealing with synonyms, metaphors, and meronymy in word embeddings.
2. What is amazing compared to previous works?
Their method can handle semantic relationships (synonyms, antonyms, hypernymy, and meronymy) while preserving the original learned word vector space.
3. Where is the key to technologies and techniques?
They map each semantic relation into one subspace (left).
To learn the project matrices (W{syn}, W{hyp}, and W_{mer}), they define three loss functions.
attract symmetric loss:
repel symmetric loss:
attract asymmetric loss:
They also define negative sampling losses.
attract negative sampling:
repel negative sampling:
Using these functions, they define relation-specific losses.
Synonymy:
Antonymy (using the same subspace with synonymy):
Hypernymy:
Meronymy:
Total loss function is sum of these functions:
4. How did evaluate it?
4.1 Intrinsic tasks
4.2 Extrinsic tasks
5. Is there a discussion?
From Table 5, each subspace learned each relation-specific information.
0. Paper
@article{arora-etal-2020-learning, title = "Learning Lexical Subspaces in a Distributional Vector Space", author = "Arora, Kushal and Chakraborty, Aishik and Cheung, Jackie C. K.", journal = "Transactions of the Association for Computational Linguistics", volume = "8", year = "2020", url = "https://aclanthology.org/2020.tacl-1.21", doi = "10.1162/tacl_a_00316", pages = "311--329", abstract = "In this paper, we propose LexSub, a novel approach towards unifying lexical and distributional semantics. We inject knowledge about lexical-semantic relations into distributional word embeddings by defining subspaces of the distributional vector space in which a lexical relation should hold. Our framework can handle symmetric attract and repel relations (e.g., synonymy and antonymy, respectively), as well as asymmetric relations (e.g., hypernymy and meronomy). In a suite of intrinsic benchmarks, we show that our model outperforms previous approaches on relatedness tasks and on hypernymy classification and detection, while being competitive on word similarity tasks. It also outperforms previous systems on extrinsic classification tasks that benefit from exploiting lexical relational cues. We perform a series of analyses to understand the behaviors of our model.1Code available at https://github.com/aishikchakraborty/LexSub.", }
1. What is it?
They proposed a new approach to dealing with synonyms, metaphors, and meronymy in word embeddings.
2. What is amazing compared to previous works?
Their method can handle semantic relationships (synonyms, antonyms, hypernymy, and meronymy) while preserving the original learned word vector space.
3. Where is the key to technologies and techniques?
They map each semantic relation into one subspace (left). To learn the project matrices (W{syn}, W{hyp}, and W_{mer}), they define three loss functions.
attract symmetric loss:
repel symmetric loss:
attract asymmetric loss:
They also define negative sampling losses.
attract negative sampling:
repel negative sampling:
Using these functions, they define relation-specific losses.
Synonymy:
Antonymy (using the same subspace with synonymy):
Hypernymy:
Meronymy:
Total loss function is sum of these functions:
4. How did evaluate it?
4.1 Intrinsic tasks
4.2 Extrinsic tasks
5. Is there a discussion?
From Table 5, each subspace learned each relation-specific information.
6. Which paper should read next?