4. Exploring Semantic Spaces - [E3] 3. Nikhil Garg, Londa Schiebinger, Dan Jurafsky and James Zou’s follow-on article. 2018.

JunsolKim commented 2 years ago

Post questions here for this week's exemplary readings: 3. Nikhil Garg, Londa Schiebinger, Dan Jurafsky and James Zou’s follow-on article. 2018. “Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes.” Proceedings of the National Academy of Sciences 115(16): E3635-E3644

isaduan commented 2 years ago

When analysing gender or racial bias across time, the authors trained word-embedding models for each decades and compare the "Pearson correlation in embedding bias scores for adjectives." But how does the cross-time comparison of the size of bias makes sense if the content of bias, represented by most associated adjectives, also change?

Jiayu-Kang commented 2 years ago

When analysing gender or racial bias across time, the authors trained word-embedding models for each decades and compare the "Pearson correlation in embedding bias scores for adjectives." But how does the cross-time comparison of the size of bias makes sense if the content of bias, represented by most associated adjectives, also change?

My understanding of the metrics is that they are using the relative norm distance to calculate the bias scores for each decade, so the scores are comparable even when the most associated adjectives vary. In their validation section, it looks like the method does make sense since the relationship between the bias score and "reality" (occupation participation) is consistent overtime. However, I'm wondering how the authors decided to train a separate embedding for each decade (versus a longer or shorter period of time). What are the conventions or considerations for choosing the unit? What would happen if they train separate embeddings by historical events?

Qiuyu-Li commented 2 years ago

This is a very interesting and meaningful paper! My question is: What’s the difference between word embedding and the n-gram we used in Week 2 for finding related words or related context? In particular, how to choose between the two based on the application scenario? Thank you!

pranathiiyer commented 2 years ago

Very interesting read! The paper talks about repercussions of bias in these embeddings if they are used for sensitive applications, product recommendations, etc. How are these embeddings incorporated in real life applications such as these? Moreover, are they trained over more representative data and data across more geographies before it is used for such mechanisms?

Hongkai040 commented 2 years ago

When analysing gender or racial bias across time, the authors trained word-embedding models for each decades and compare the "Pearson correlation in embedding bias scores for adjectives." But how does the cross-time comparison of the size of bias makes sense if the content of bias, represented by most associated adjectives, also change?

My understanding of the metrics is that they are using the relative norm distance to calculate the bias scores for each decade, so the scores are comparable even when the most associated adjectives vary. In their validation section, it looks like the method does make sense since the relationship between the bias score and "reality" (occupation participation) is consistent overtime. However, I'm wondering how the authors decided to train a separate embedding for each decade (versus a longer or shorter period of time). What are the conventions or considerations for choosing the unit? What would happen if they train separate embeddings by historical events?

I think they didn't train embeddings on this corpus(though Jurafsky was involved in training vectors). They used pre-trained models(except models trained from New York Times) to do downstream analysis. In section 6 "Google Books/COHA Trained on a combined corpus of genre-balanced Google books and the Corpus of Historical American English (COHA) (Davies, 2010) by Hamilton, Leskovec, and Jurafsky. For each decade, a separate em- bedding is trained from the corpus data corresponding to that decade. The dataset is specifically designed to enable comparisons across decades, and the creators take special care to avoid selection bias issues. " I think this may answer your question.

My question comes from models trained on New York Times corpus. They trained embeddings using reports of New York Times from 1988-2005 in a year based window(trained in 3 year windows) to measure Religious (Islam vs Christianity) bias score. This is the smallest corpus and shortest time span covered in this paper. Following questions from orientation reading, is this a big enough corpus for word embedding training?

pranathiiyer commented 2 years ago

This is a very interesting and meaningful paper! My question is: What’s the difference between word embedding and the n-gram we used in Week 2 for finding related words or related context? In particular, how to choose between the two based on the application scenario? Thank you!

As I understand it, n-grams are just n tokens appearing together, you could choose this number. Whereas word embeddings are condensed vector representations of words. So similar words would have similar vectors in the vector space and might be closer. You could also train embedding models on ngrams.

sizhenf commented 2 years ago

This is a very interesting read! The author train an word embedding model to study gender bias in each decade. It is very inspiring to see how NLP methods in general is used in studying gender inequality. I'm curious about how the authors identify neutral words in terms of ethnical and sexual biases vs use of "manly" words that are neutral and simply out of conventions.

Emily-fyeh commented 2 years ago

Since I kind of know this paper before, I pay attention to the ethnic-occupation stereotype this time. I wonder how would the authors tackle the predominant referral of the white ethnicity as a default setting in the historical documents? That is, many presentations may not be specifically associated with "white" names, but still obviously contain cues of (over)representation of Caucasians.

NaiyuJ commented 2 years ago

It's really impressive to see how word embeddings can be used to analyze the changes in people's stereotypes and attitudes. However, I personally think that gender and ethnicity might be two too large and noisy demographic features, for which we want to capture the accurate trend using word embeddings. Other than the changes in word co-occurrences, the conceptualization and interpretation of these two features themselves can change over time.

YileC928 commented 2 years ago

An interesting and ambitious paper! I am a bit confused about the section 'Validation of the embedding bias'. The authors suggest because the embedding bias is highly correlated with occupation participation, the former 'accurately reflect' the latter. Whereas I doubt that mere correlation is not a piece of evidence strong enough for such a conclusion.

kelseywu99 commented 2 years ago

Interesting read on how bias in word embedding entails new research opportunities. I also find the validation section of the research a bit puzzling. In its validation stage, the authors mention about the validation of some aspects in the research cannot be established due to the absence of quantification of gender stereotypes in literature, and I was wondering if an alternative approach would work in this case.

melody1126 commented 2 years ago

Really interesting methodological paper! Great appendix section with lots of details. If we were to take a topic modeling approach to studying stereotypes overtime using Census data, how would the results be interpreted differently?

ttsujikawa commented 2 years ago

Highly interesting research! About the law of conformity, the author hypothesizes its root is possibly the sociocultural conformity bias that makes people less likely to accept novel innovations of common words, a mechanism analogous to the biological process of purifying selection. Here, I think that conformity could be analyzed under the word embedding approaches with texts of broader topics. It would be more inspiring if we could go further on this hypothesis.

sudhamshow commented 2 years ago

I was wondering if algorithmic bias of these word-vector models could be fixed manually. Since we know how the hyperspace is oriented and what words are close by (due to context from learned data), can't one manually displace a word of interest from its current position to a new one? For example move woman from the proximity of nurse to that of doctor or engineer, thus resolving for stereotypes?

floriatea commented 4 months ago

How do the changes in word embedding biases over the last century reflect broader societal changes in attitudes towards gender and ethnic groups, and what does this imply about the relationship between language and culture?

UChicago-Computational-Content-Analysis / Readings-Responses-2023

4. Exploring Semantic Spaces - [E3] 3. Nikhil Garg, Londa Schiebinger, Dan Jurafsky and James Zou’s follow-on article. 2018. #36