UChicago-Computational-Content-Analysis / Readings-Responses-2023

1 stars 0 forks source link

4. Exploring Semantic Spaces - [E2] 2. Hamilton, William H., Jure Leskovec, Dan Jurafsky. 2016. #37

Open JunsolKim opened 2 years ago

JunsolKim commented 2 years ago

Post questions here for this week's exemplary readings: 2. Hamilton, William H., Jure Leskovec, Dan Jurafsky. 2016. “Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change.” arXiv preprint arXiv:1605.09096. (All code and data available here.

konratp commented 2 years ago

This was an interesting, albeit challenging read. I can think of several examples of the meaning of polysemous words shifting in the general population based on specific events. For example, the authors mention the changing meaning of the German word Widerstand (resistance) following the Third Reich; I can also imagine the words "tablet" and "cloud" undergoing similar shifts in meaning. I wonder if it is possible to analyze the law of conformity while accounting for such words that very clearly shift due to specific events (e.g. the introduction of the iPad)? Would the effect remain significant?

Jasmine97Huang commented 2 years ago

I am really interested to see if dynamic embeddings like BERT could be used to analyze semantic shifts in diachronic corpus. From a technical perspective, it seems to be more efficient and organic since BERT is able produce different representations for the same word given different contexts. Any pointers towards where I can read about this implementation?

GabeNicholson commented 2 years ago

very interesting and clever approach from the researchers to test this hypothesis. I suppose one extension of this idea would be to see how the laws of semantic change generalize to the extreme case of Pidgin languages. I have no idea how you would test this other than recording verbal transcripts, but still theoretically interesting if it could be pulled off.

facundosuenzo commented 2 years ago

This was an interesting, albeit challenging read. I can think of several examples of the meaning of polysemous words shifting in the general population based on specific events. For example, the authors mention the changing meaning of the German word Widerstand (resistance) following the Third Reich; I can also imagine the words "tablet" and "cloud" undergoing similar shifts in meaning. I wonder if it is possible to analyze the law of conformity while accounting for such words that very clearly shift due to specific events (e.g. the introduction of the iPad)? Would the effect remain significant?

+1. I was also thinking about the fact that the distribution of polysemy scores varies substantially across languages (even though all are negative). What does it mean in terms of language-specificity? (for example, the score for the Chinese corpora compared with COHA). Could be possible to think about languages in which those laws have more nuances?

chentian418 commented 2 years ago

It's very interesting to learn about historical semantic changes of words and their relationship with word frequency and being polysemy. On the one hand, we can learn learn about time-series semantic changes by learning from concurrent contexts and imposing alignment. On the other hand, I am curious about is there any possibility to separate a polysemous word to different means and thus forming one word vector per meaning? Thanks!

mikepackard415 commented 2 years ago

The authors speculate in their conclusion about the role of polysemy in language acquisition. I wonder whether we could use word embeddings trained on corpora that are stratified not on decade but on reading level. This has the potential to tell us about how children expand their vocabulary and continuously add nuance to their understanding of language. I wonder if people think this idea of reading-level-specific word embeddings has any value?

hshi420 commented 2 years ago

It's really interesting how the authors integrate the novel words embedding models with the traditional longitudinal design. According to the authors, the three models offer the "meaning" of the words, and thus they can measure whether and how the meaning of the words change overtime. I was wondering if the SOTA language models now can offer similar things.