Computational-Content-Analysis-2020 / Readings-Responses-Spring

Repository for organizing orienting, exemplary and fundament readings, and posting responses.
0 stars 0 forks source link

Exploring Semantic Spaces (E2) - Hamilton et al 2016 #29

Open HyunkuKwon opened 4 years ago

HyunkuKwon commented 4 years ago

Post questions about the following exemplary reading here:

Hamilton, William H., Jure Leskovec, Dan Jurafsky. 2016. “Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change” arXiv preprint arXiv:1605.09096.

wanitchayap commented 4 years ago

This is less of a question about the paper itself (I think it is a fascinating and well-designed study) but more of a question about the possible application of how they quantified polysemy in this paper.

I asked in this week's orientation paper about how could word embedding model deals with polysemy using unsupervised methods. Do you think how the authors used PPMI to create words' network then measured the clustering coefficient to infer the degree of polysemy is a good solution for polysemy problem in word embedding model? For example, we know from PPMI network that rock is highly polysemy, and thus, we can train the embedding model that separates rock into three different word tokens (e.g. separate vectors for rock_music, rocking, androck_geology).

However, I am not sure how can we know they're supposed to be 3 different senses related to rock for the PPMI network. In addition, we need to be able to distinguish each rock into the 3 senses in the texts for the embedding model's training. Would the PPMI network alone be sufficient to do such task? We also need to decide the cutoff in the clustering coefficient to determine when a word is polysemy enough to treat it as different vectors in the model.

In addition, I am not sure how to deal with contextually diverse discourse function words (e.g. also) that PPMI network would treat as highly polysemy. I think it makes sense in this paper context to treat these function words as highly polysemy. However, I don't think we should have different vectors for also in different contexts.

In short, do you think this polysemy measure the authors use could be a good starting point to deal with polysemy in the word embedding model?