4. Exploring Semantic Spaces - fundamental

JunsolKim commented 2 years ago

Post questions here for this week's fundamental readings: Jurafsky, Daniel and James H. Martin. 2015. Speech and Language Processing. Chapters 15-16 (“Vector Semantics”, “Semantics with Dense Vectors”); Grimmer, Justin, Molly Roberts, Brandon Stewart. 2022. Text as Data. Princeton University Press: Chapter 8 —“Distributed Representations of Words”.

pranathiiyer commented 2 years ago

The authors touch upon this briefly, yet, how does one deal with bias in word embeddings if the embedding is only a representation of the underlying text? Do we expose the model to varying sources of text, and would the feasibility of that not depend on the nature of the research?
The chapter describes several ways of dealing with unseen words. Is there any approach that has proven to be more reliable than others? Else, how does one decide given how computationally intensive it can be to try each one of them?

isaduan commented 2 years ago

On pre-trained contextualized word embeddings like GPT-3, in what ways could we leverage it for academic research?

konratp commented 2 years ago

The authors mention reference points as a way for machines to pick up on different tenses. How are such problems approached with words/sentences that are identical, yet might convey different information? For example, the sentence "I read this book" could be both present and future tense. Are machines able to differentiate between the two?

If so, is that true for the analysis of other languages as well? I'm considering analyzing a set of texts that are not in English -- if I were to employ such methods, would I run into issues?

Jiayu-Kang commented 2 years ago

The authors touch upon this briefly, yet, how does one deal with bias in word embeddings if the embedding is only a representation of the underlying text? Do we expose the model to varying sources of text, and would the feasibility of that not depend on the nature of the research?

The chapter describes several ways of dealing with unseen words. Is there any approach that has proven to be more reliable than others? Else, how does one decide given how computationally intensive it can be to try each one of them?

I have the same question on the attempt to "debias" embedding. If embedding reflects the language in association with the context, how is it possible to remove "biased" associations? When considering research goals and selecting data, what precautions can we take to avoid amplifying existing inequalities?

Jasmine97Huang commented 2 years ago

In chapter Vector Semantics and Embeddings, the authors point out that when changing window sizes, we got different results for words most similar to Hogwarts. What is the strategy for choosing the right window size? I don't remember seeing a conclusive rule-of-thumb for this important hyper parameter.

GabeNicholson commented 2 years ago

The authors touch upon this briefly, yet, how does one deal with bias in word embeddings if the embedding is only a representation of the underlying text? Do we expose the model to varying sources of text, and would the feasibility of that not depend on the nature of the research?

The chapter describes several ways of dealing with unseen words. Is there any approach that has proven to be more reliable than others? Else, how does one decide given how computationally intensive it can be to try each one of them?

I have the same question on the attempt to "debias" embedding. If embedding reflects the language in association with the context, how is it possible to remove "biased" associations? When considering research goals and selecting data, what precautions can we take to avoid amplifying existing inequalities?

In the case of academic research that looks to find associations between text and the ideas around that text, the bias isn't something to be removed but is instead something worth closer examination. So to answer the first question, bias in word embeddings isn't an issue if the research goal is a deeper theoretical understanding of the underlying text. If the underlying goal has some practical application, removing the bias will probably require close examination and data cleaning, so there won't be a one-size-fits-all solution.

ValAlvernUChic commented 2 years ago

The authors mention reference points as a way for machines to pick up on different tenses. How are such problems approached with words/sentences that are identical, yet might convey different information? For example, the sentence "I read this book" could be both present and future tense. Are machines able to differentiate between the two?

If so, is that true for the analysis of other languages as well? I'm considering analyzing a set of texts that are not in English -- if I were to employ such methods, would I run into issues?

Hi! konratp! I'm not super sure about how word embedding models would do this (they can though) but there are classes of language models that take context in a paragraph or document into account when determining words. For example, a bidirectional LSTM takes into account all previous and future words and creates a probability distribution to determine whether the "read" used is past or present. If the text has obvious signals of being in the past tense, such as regular past tense verbs, it's totally possible!

Qiuyu-Li commented 2 years ago

The authors touch upon this briefly, yet, how does one deal with bias in word embeddings if the embedding is only a representation of the underlying text? Do we expose the model to varying sources of text, and would the feasibility of that not depend on the nature of the research?

The chapter describes several ways of dealing with unseen words. Is there any approach that has proven to be more reliable than others? Else, how does one decide given how computationally intensive it can be to try each one of them?

I have the same question on the attempt to "debias" embedding. If embedding reflects the language in association with the context, how is it possible to remove "biased" associations? When considering research goals and selecting data, what precautions can we take to avoid amplifying existing inequalities?

In the case of academic research that looks to find associations between text and the ideas around that text, the bias isn't something to be removed but is instead something worth closer examination. So to answer the first question, bias in word embeddings isn't an issue if the research goal is a deeper theoretical understanding of the underlying text. If the underlying goal has some practical application, removing the bias will probably require close examination and data cleaning, so there won't be a one-size-fits-all solution.

I think @Halifaxi gave a very great answer to this question of "debias". Just a little addition: I think many of the word-embedding packages we actually use will be pre-trained using a variety of materials.

Another question from me that's a little bit off-topic: I'm curious about how each of the NLP techniques we've learned is related to the linguistic system of language. And is there any linguistic topic or concept that has not been "parsed" by NLP? I got this question at the very beginning of Chapter 6, where the authors kindly introduced us to some linguistic concepts.

sizhenf commented 2 years ago

The authors touch upon this briefly, yet, how does one deal with bias in word embeddings if the embedding is only a representation of the underlying text? Do we expose the model to varying sources of text, and would the feasibility of that not depend on the nature of the research?

The chapter describes several ways of dealing with unseen words. Is there any approach that has proven to be more reliable than others? Else, how does one decide given how computationally intensive it can be to try each one of them?

I have the same question on the attempt to "debias" embedding. If embedding reflects the language in association with the context, how is it possible to remove "biased" associations? When considering research goals and selecting data, what precautions can we take to avoid amplifying existing inequalities?

In the case of academic research that looks to find associations between text and the ideas around that text, the bias isn't something to be removed but is instead something worth closer examination. So to answer the first question, bias in word embeddings isn't an issue if the research goal is a deeper theoretical understanding of the underlying text. If the underlying goal has some practical application, removing the bias will probably require close examination and data cleaning, so there won't be a one-size-fits-all solution.

I think @Halifaxi gave a very great answer to this question of "debias". Just a little addition: I think many of the word-embedding packages we actually use will be pre-trained using a variety of materials.

Another question from me that's a little bit off-topic: I'm curious about how each of the NLP techniques we've learned is related to the linguistic system of language. And is there any linguistic topic or concept that has not been "parsed" by NLP? I got this question at the very beginning of Chapter 6, where the authors kindly introduced us to some linguistic concepts.

I'd like to second this question. We've read a lot "vectorizing" words and texts and then apply statistical/machine learning methods in analyzing texts. I wonder what role linguistic theories play in NLP studies.

facundosuenzo commented 2 years ago

I had a hard time understanding Neural networks on Jurafsky & Martin's reading. My questions are: 1) in which scenarios/type of corpora/ research questions can I use those models? What limitations do they have? 2) What are the practical and empirical differences compared with a vector semantic and word embedding approach?

hsinkengling commented 2 years ago

I had a hard time understanding Neural networks on Jurafsky & Martin's reading. My questions are: 1) in which scenarios/type of corpora/ research questions can I use those models? What limitations do they have? 2) What are the practical and empirical differences compared with a vector semantic and word embedding approach?

I second this question. I also want to add another question about neural networks.

While the authors explained neural networks through a 2-layer model, they also mentioned the existence of "deep neural networks", which is like "layer after layer of logistic regression classifiers". For neural networks with additional layers, what does the addition of extra layers add to the analysis?

Hongkai040 commented 2 years ago

In Text as data, the authors say that word embedding learns the association between a word and the context. That's how word embedding measures meaning: Two words are considered to have the same meaning when they're used in similar contexts. What's the different and limitation of this concept comparing with out intuitive sense of meaning?

Sirius2713 commented 2 years ago

In Jurafsky's book section 7.4, the authors talk about pooling. Is there any other benefits of pooling besides reducing the dimensions of imputs?

chentian418 commented 2 years ago

We know that word2Vec models can be trained using two distinct objective functions-the continuous bag of words (CBOW) model and the skipgram model. CBOW is premised on the idea of setting up a prediction problem where a focal word is predicted by the context around it, while in skipgram context words are predicted one at a time from the focal word. I am curious that, is there any significance difference when training the neural networks using CBOW and skipgram respectively? And are there any scenarios that one method is preferable than the other? Thanks!

mikepackard415 commented 2 years ago

It seems like all the more recent and most powerful word embedding methods are those that rely on some form of neural network. Is it necessarily the case that neural networks are always producing better embeddings? What are some of the tradeoffs between the neural network approaches and other approaches?

hshi420 commented 2 years ago

Do we need to evaluate the pretrained model's performance before we finetune the model on the downstream task or corpus? My understanding is that if the downstream task is different from how the model was trained, then we have to finetune the model anyways. If the downstream task is the same as how the model is trained, and the new corpus is very time-consuming to be trained on, then we can evaluate the pretrained model's performance first.

NaiyuJ commented 2 years ago

Is it possible to add attention layers to the word2vec algorithm? In what situation and for what types of data would this improve the performance?

sudhamshow commented 2 years ago

I wonder why the parallelogram model works the way it does for word embeddings? Starting with a randomly initialised matrix and iteratively updating weights for the classification problem, it makes sense intuitively that the gradient descent of the cost function iteratively pushes words found more often in a context closer to each other in the hyperspace. But how does it manage to maintain the same orientation (like the parallelogram) for related words? Can the parallelogram model be validated on all types of embeddings (including count based models and sparse models)? Are there any caveats?

LuZhang0128 commented 2 years ago

Based on chapter 16 and my understanding of gradient descent (which can be wrong), I wonder if we can only get the local optima of the model? If so, then does that mean that our result for the model each time could be different? How can we ensure that understanding of the corpus, if largely based on the neural network result, is correct and consistent?

kelseywu99 commented 2 years ago

Pretrained word embeddings in large corpus sometimes generate stereotypes, which derives from the way language was used. How does this problem elicit new research aspects? What are some methods to diminish this bias?

Emily-fyeh commented 2 years ago

I would like to know more about "freezing" the embedding layer when training the neural language model. How do we decide that embedding E would not be updated during the whole process?

YileC928 commented 2 years ago

I am also curious about the merits of building world embedding models on neural networks, and how to choose and improve the algorithms.

chuqingzhao commented 2 years ago

I am curious about how to justify different word embedding algorithms.

melody1126 commented 2 years ago

In chapter 6.3, how is the document dimension different from the word dimension, since both dimensions are used in the analysis?

ttsujikawa commented 2 years ago

In terms of the neural network method, what could be considered as an improvement of the method? For example, when it comes to the neural network utilized for image processing, its results are always highly explicit with precision as numbers. However, when it comes to NLP, what could be the universal measurement of how we evaluate methods/theories in neural network?

AllisonXiong commented 2 years ago

How should we capture the meaning of unseen words, or words with frequency lower than the threshold and are consequently not included in the corpus?

floriatea commented 4 months ago

Given the evolving capabilities of chatbots to mimic human conversation and emotions, what are the potential impacts on human psychology and social behavior, especially considering the deep emotional engagements users can have with chatbots, as seen in the case of ELIZA?

UChicago-Computational-Content-Analysis / Readings-Responses-2023

4. Exploring Semantic Spaces - fundamental #35