4. Word Embeddings to Explore Meaning Spaces-fundamental

lkcao commented 6 months ago

Post questions here for this week's fundamental readings:

Jurafsky, Daniel and James H. Martin. 2015. Speech and Language Processing. Chapters 15-16 (“Vector Semantics”, “Semantics with Dense Vectors”)

Grimmer, Justin, Molly Roberts, Brandon Stewart. 2022. Text as Data. Princeton University Press: Chapter 8 —“Distributed Representations of Words”.

XiaotongCui commented 5 months ago

I have questions about NEURAL WORD EMBEDDINGS. The book only mentions using neural networks for word embedding, so what kind of downsides might there be? And are there the usual issues of interpretability associated with neural networks?

sborislo commented 5 months ago

Given the often high (time, effort, and--in some cases--financial) costs of running intense text analyses, and it appearing like many of the reported analyses visualize already-established relationships as opposed to suggesting new ones, what is the niche of text analysis? Although I can think of such cases (e.g., studying The Federalist Papers), the reading discussed many limitations of the methods that seem to only be resolvable with a baseline understanding of the examined phenomena. For instance, debiasing relies on already knowing the ground truth as we see it. How can we use text analysis to validly draw entirely new inferences, and draw such inferences in a falsifiable way (in the case of "black box" deep learning algorithms)?

Twilight233333 commented 5 months ago

I have a general question about word embeddings, if we have a new combination, or if we want to define a particular similarity of a particular word to other words, can we adapt it to our own needs? What I understand from the material I read in the book is that word similarities are automatically generated through a lot of training, so is there a way for us to define some similarities ourselves?

chanteriam commented 5 months ago

I found Daniel and Martin's "Vector Semantics & Embeddings" extremely beneficial to my understanding of word embeddings. I would benefit from some clarification for how word embeddings for skip-grams are created. The chapter discusses how you try to maximize the distance between the target word and negative words, and minimize the distance between target words and context/positive words; it further discusses how negative words are chosen as a function of their probability of occurring. However, it doesn't explain how we use negative word probabilities; do we choose the ones calculated as having the highest probability of occurring? Additionally, the text explains that we select negative words from the vocabulary, excluding the target word; wouldn't this cause us to sometimes choose one of our context words, thus leaving us in a paradoxical situation where we are both trying to maximize and minimize the distance to the same word? Finally, for these initially randomized embeddings, how do we decide the dimensionality/number of rows we want our word vectors to be?

h-karyn commented 5 months ago

I didn't quite understand how retrofitting is related to having rare words in the corpus. According to the textbook, it can be used to identify known relationships, but what can we use this for?

QIXIN-ACT commented 5 months ago

Studies like Caliskan et al. (2017) have shown biases in word embeddings, like associating African-American names with unpleasant words. While recent research aims to reduce these biases in NLP models, I'm pondering a potential conflict. If these biases in embeddings reflect societal biases, isn't it important to preserve them to some extent for historical accuracy and understanding societal trends? How can we balance the need to document historical biases in language with the imperative to reduce future biases in NLP models? Are there methods or approaches that allow us to achieve both objectives?

Audacity88 commented 5 months ago

Section 8.5 of Text as Data notes that contextual embedding models, including GPT-3, are able to capture information about word order, but require significantly greater amount of computational power to process. I am wondering why exactly it is so computationally expensive to factor in word order?

yuzhouw313 commented 5 months ago

Grimmer et al. introduced the Latent Semantic Analysis first proposed by Deerwester et al. However, I find the approach of using an entire document as the context window somewhat counter-intuitive, especially when considering the foundational principles of word embedding. Typically, the meaning of a word is understood to be closely linked to its immediate neighbors. Therefore, doesn't the inclusion of the entire corpus or dataset in LSA contradict this fundamental assumption about word meaning being contextually dependent on neighboring words?

bucketteOfIvy commented 5 months ago

As a really simple but clarifying question: when using embedding models like word2vec, are we able to consider n-grams as tokens by just adding them to the vocabulary list?

alejandrosarria0296 commented 5 months ago

When explaining the linguistic theory that justifies word embedding's approach to language (the distributional hypothesis), Grimmer, Roberts and Stewart argue that the meaning of a word can be deduced based on the words around it. The theory makes sense and the vast amount of literature on it proves it right. I'm interested in how a word embedding approach based on this theory may be affected by elements in a word that are independent of the words around it and that dramatically alter its' meaning. (the difference between "It's rich" and "It's rich", for an innocent example; "people" vs. "(((people)))" for a more dangerous one). Given that, over the years, online communities, both mainstream and niche, have developed similar patterns of speech that alter the meaning to words, how may word embeddings methods change to capture this "hidden" meanings?

volt-1 commented 5 months ago

Drawing from the insights on the dynamic nature of language. This exploration could examine the potential of these models to forecast the emergence of new vocabulary or the evolution of existing word meanings. Can neural language models, using extensive textual data over time, reveal and predict patterns of English linguistic evolution?

runlinw0525 commented 5 months ago

The chapter on distributed representations of words, especially the part about word embeddings and their ability to encode similarities and nuances in language, is quite intriguing. Considering how word embeddings capture nuanced language use, do you think they could be effectively applied to understand the evolving language and sentiment in internet memes?

yueqil2 commented 5 months ago

I'm interested in the techniques for rare words mentioning in Chapter8. The authors produce a time-period-specific embedding for manufacturing when analyzing SOTU corpus. I wonder how they find the time point (1960) and how they know that the use of the word manufacturing here is different between pre-1960 and post-1960? Did they hypothesize through theory and other materials, or did they find this feature and time node based on the training results that they had already obtained and then revised it?

ana-yurt commented 5 months ago

I wonder if we could apply transfer learning to derive new word2vec models using a relatively small custom corpus, since there are many good models pre-trained on large-scale corpora.

cty20010831 commented 5 months ago

After reading the textbook regarding word embedding, I am thinking how I can relate it to my final project. I am wondering how word embeddings can be applied to examine the "diversity" of words and/or topics in clusters?

joylin0209 commented 5 months ago

After reading Daniel and Marti's article, I was impressed by the Loss function. I would like to know how the Loss function model adapts to and handles the differences between different languages in a multi-language environment?

yunfeiavawang commented 5 months ago

We have learned a lot of methodologies for dealing with textual data, and they all seem to extract the essence of large-scale language corpus. I would appreciate it if a systematic overview of the applicable question ranges, advantages, and disadvantages of these methods could be provided.

donatellafelice commented 5 months ago

I really like the possibility of capturing more nuances in words especially with an eye to my final. however, I was confused by this argument from the chapter 6, Vector semantics and embedding: "Consider the meanings of the words coffee and cup. Coffee is not similar to cup; they share practically no features (coffee is a plant or a beverage, while a cup is a manufactured object with a particular shape). But coffee and cup are clearly related; they are associated by co-participating in an everyday event (the event of drinking coffee out of a cup). Similarly scalpel and surgeon are not similar but are related eventively (a surgeon tends to make use of a scalpel.."

I was very confused by this example. There is a specific link between coffee and cup, namely you can call a single object a coffee cup, (like a tea cup but not as often used...), similar (but not the same) to a 'surgeon's scalpel'. While the scalpel example contains the possessive, a coffee cup is a compound noun - like cutting board or toilet paper... surely these examples are treated differently? Coffee cup changes the meaning, it's a specific thing, not two separate things, like toilet paper means something specific, not paper for the toilet. I am confused how this type of word grouping thus doesn't appear to impact the way they're describing the wordsense.

ethanjkoz commented 5 months ago

I think that Alejandro poses a great question! As a sort of follow up to that I was wondering how word embedding techniques could be applied to new words/slang. Are the word2vec embedding techniques able to pick up on novel uses of words? For example, slang like "based" or "slaps" (as some crude examples off the top of my head) has acquired new meanings with different generations.

anzhichen1999 commented 5 months ago

Considering the advancements in feedforward networks for NLP classification, as detailed in Jurafsky and Martin's textbook, and the use of word embeddings in historical linguistic analysis from the “100 year) paper, how can these technologies be synergistically utilized to develop a more sophisticated real-time sentiment analysis tool? Specifically, how can the temporal dynamics of word embeddings, as used in the “100 year” study, enhance the accuracy and contextual relevance of sentiment classification in feedforward neural networks for monitoring evolving public opinions on social media platforms?

Brian-W00 commented 5 months ago

How do the principles of vector semantics and dense vector representations challenge traditional linguistic theories in terms of structure and language meaning? To be more specific, in the context of capturing human language understanding

michplunkett commented 5 months ago

Daniel Jurafsky and James H. Martin's writing was SO intuitive, I really enjoyed their writing style and how they (relatively) simply conveyed otherwise complicated topics. Coming from a biological sciences background, I have always been a little fascinated by the use of neurons as models for learning. While intuitive in the sense of neurons -> brains -> learning, the actual application of using them is quite interesting. I'd be interested in hearing what other biological systems are used as models for mathematics, data science, computer science, etc.?

One that comes to mind that I imagine could be applied elsewhere is the concept of quorum sensing in bacteria. It feels like it has the complication necessary, but could also be vaguely applied in other spaces. Another one is the mathematics behind dynamic gene expression (I'm thinking DNA methylation, loose third nucelotides for RNA to amino acid translation, etc.).

erikaz1 commented 5 months ago

Given the increasingly digitized world over the past decades and the acceleration of technological advance, how often should pre-trained models be updated? How and where can we go to look for newer embeddings that may have been trained differently?

HamsterradYC commented 5 months ago

What methods can be employed to ensure that the continuous evolution of language, including the creation of new words and phrases, is captured in word embeddings?

Dededon commented 5 months ago

My own experience of using word embedding to finding the antonym dictionaries of words doesn't go well: adding in new pairs of "antonyms" may shift the results drastically. Is there a way to improve such practice?

YucanLei commented 5 months ago

I focused on VectorSemanticsand Embeddings. I have the concern that the use of vector representations for word meaning raises questions about the interpretability and transparency of these models. Understanding how these vector representations encode meaning and how they generalize across different linguistic contexts is an ongoing area of research and debate.

chenyt16 commented 5 months ago

How can we optimize the performance of word embedding models when the size of corpus is limited? For example, how to solve the overfitting problem? (I have this question because the data size of my final project may not be "sufficiently large".)

JessicaCaishanghai commented 5 months ago

With many more new meanings are being assigned to the old words, the train algorithms may be not reliable anymore and need rapid update. How can we solve this problem? Is there a way to automatically capture and detect the nuances shift in word meanings to solve the problems?

Caojie2001 commented 5 months ago

Word embedding is an interesting method for me as it takes the word order into account. However, I wonder whether it's hard to apply word embedding methods to a corpus with texts of rather small length, where the context information or a word might be limited?

floriatea commented 4 months ago

Given the evolving capabilities of chatbots to mimic human conversation and emotions, what are the potential impacts on human psychology and social behavior, especially considering the deep emotional engagements users can have with chatbots, as seen in the case of ELIZA?

Vindmn1234 commented 3 months ago

How do dense vector representations of semantics enhance our understanding and processing of natural language, particularly in relation to capturing nuanced meanings across different contexts? Additionally, what are the implications of these advancements for the development of more sophisticated natural language understanding and generation systems, and how do they address challenges associated with polysemy and context-specific language use?

Carolineyx commented 3 months ago

Given that word embeddings provide a dense representation of words based on their context, enabling models to capture semantic relationships between words, how do neural networks, especially feedforward neural networks and neural language models, leverage these embeddings to improve the accuracy of tasks like language modeling or sentiment analysis? Furthermore, how the dimensionality of these embeddings affects the performance and computational efficiency of neural networks in processing natural language?

UChicago-Computational-Content-Analysis / Readings-Responses-2024-Winter

4. Word Embeddings to Explore Meaning Spaces-fundamental #36