4. Exploring Semantic Spaces - oritenting

JunsolKim commented 2 years ago

Post questions here for this week's oritenting readings: Kozlowski, Austin, Matt Taddy, James Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 85(5):905-949.

thisspider commented 2 years ago

Firstly, Kozlowski et al.'s findings are based on the dataset of millions of books. This dataset is composed of many types of texts and it is tempting to ask if they would hold for each of the subcorpora, e.g., single genre's of novels, single sociology journal, or books authored by a single author. But would those datasets? be large enough? Specifically, what is the smallest dataset for which a word embedding has been successfully trained (and which resulted in a publication)? During our analysis, how would we know if the dataset we are using is large enough?

Second, what are the ways of identifying the most important dimensions characterizing our corpus?

pranathiiyer commented 2 years ago

A very interesting read! I was wondering about the construction of the cultural dimensions themselves.

The paper looks at association of word embedding models based on text across several years across time, somewhat based on our validation of these associations today (by using M-turkers in 2016). Do you think results would be different if we considered text only from recent years, i.e. we might see stronger correlations? Or perhaps they would not be significantly different and hence speak for the relevance of these dimensions across time in some sense?
The paper mentions that the gains in correlation between human ratings and projections on a dimension. Do we then remove certain pairs that were used to create the dimension, since even random assignment performs only marginally worse?

isaduan commented 2 years ago

What are some best practices in constructing word lists in an effort to operationalize a dimension of culture?

Sirius2713 commented 2 years ago

This paper used Google ngram data for world embedding. But will the performance be different if we use a complete document dataset? For example, datasets of full articles, reports etc.

konratp commented 2 years ago

The authors state that "both networks and topic models are ill-suited for representing the multifarious associations and cultural valances that characterize all words in a corpus" (p. 909). Under what circumstances are semantic networks and topic modeling appropriate in the study of culture? In my view, most questions related to cultural issues I could think of that one might want to study are multifarious, yet I thought the readings from last week on topic modeling were quite insightful. How do we bridge this disconnect?

Jiayu-Kang commented 2 years ago

I'm interested in the findings that are "not so easily interpretable" (e.g. a powerful “employee” valence for "patient" and "expectancy"). For those results that do not correspond with existing theories, when should we see them as less valuable information caused by inconsistency of the model (and if this is the case, is it a threat to validity of the model used?), and when do we need to come up with new hypotheses and dig into the mechanism behind the less interpretable results? I'm curious if there is any basis to judge, or is it just a case by case decision on researchers' hunch?

Qiuyu-Li commented 2 years ago

This is a wonderful paper! I have some very general questions regarding it: Are there any topics in cultural and sociological studies where NLP can give us surprising findings, or that NLP can only serve for verifying our hypothesis? This is also related to our last week’s orientation paper and my feelings about doing last week’s homework. Machine learning is a very complex technique, and we code, debug, tune, do visualizations to make sense of the data. But if the machine itself can’t decide what makes sense, how can we reach a conclusion that’s out of our expectation? I think it is quite relevant to today’s quantitative studies with big data and sophisticated tools because there are so many decisions to make in preprocessing, training, and tuning, that we can not just follow a pre-decided framework. On the one hand, this leaves a lot of space for data mining, both consciously and unconsciously; On the other hand, we can only make decisions based on the knowledge we already know (such as choosing word pairs in this paper).

ValAlvernUChic commented 2 years ago

Firstly, Kozlowski et al.'s findings are based on the dataset of millions of books. This dataset is composed of many types of texts and it is tempting to ask if they would hold for each of the subcorpora, e.g., single genre's of novels, single sociology journal, or books authored by a single author. But would those datasets? be large enough? Specifically, what is the smallest dataset for which a word embedding has been successfully trained (and which resulted in a publication)? During our analysis, how would we know if the dataset we are using is large enough?

Second, what are the ways of identifying the most important dimensions characterizing our corpus?

I second the first question! I wonder if anyone has ever pre-trained a word embedding model on a large, separate corpus (let's say Wikipedia) and then used that model for the research question, akin to uses of BERT. If it hasn't been done before, is it feasible and what would be the associated complications from doing something like this?

Jasmine97Huang commented 2 years ago

Very interesting paper. In the Caveats and Limitation section, the authors note that “word embeddings must be trained on very large corpora….. As a result, groups that do not leave extensive textual records are difficult to study with word embeddings.” Could we use pre-trained models that are fine-tuned on a small corpus of interest to address this challenge? Furthermore, retrofitting method also allows researchers to incorporate knowledge from semantic lexicons into the vectors. Would post-processing benefit corpus that has relatively small vocabulary size? What are the risks of injecting such knowledge by force if the goal is to detect semantic change?

facundosuenzo commented 2 years ago

Great thought-provoking reading! I second thisspider question, and I also wondered how you decide the number of meaningful n-grams for word embedding? Grimmer and colleagues say, "When the window sizes is very small, it will tend to capture some syntactic meaning and when the window is larger it will tend to capture semantic meaning." (p. 82). Is the 5-grams more significant enough?

Hongkai040 commented 2 years ago

l love the idea of using wording embedding to construct cultural dimensions. The paper mentioned that we need to use a large corpus to train a model and some of my peers asked the question with regard to the requirements of the corpus. But mine is quite opposite. Can we train several word embedding models on different sub-corpus and construct similar dimensions or word-pairs to compare the differences between those sub-corpus, which usually represents groups of different cultures?

GabeNicholson commented 2 years ago

When it comes to the initial data validation once the embeddings are trained, how do you know if the patterns the embedding finds are meaningful or just a product of random noise and unlucky sampling? For example, the sports dimension varied by wealth is a great example that we can also read and intuitively agree with. However, when it comes to things that don't make sense or seem counterintuitive, how do you validate the pattern? Presumably a different but similar dataset?

sizhenf commented 2 years ago

This is a very inspiring read that discusses how word embedding models can be used in social science studies. My question is that when we apply word embedding models in studying a text, how does it deal with words that have multiple meanings? (for eg: While they are at the play, I’m going to play with the dog). I assume this is not only an issue when we study English texts but in general a pattern in many other languages.

hsinkengling commented 2 years ago

I find it very interesting that the double meaning of the words black and white makes the racial associations difficult to analyze (p920). I wonder if there are other, more colloquial word pairs that could be used in addition to the ones already mentioned and whether these word pairs are considered? (For example, in my understanding, in the early 20th century the term "negro" was considered a neutral word to describe African-Americans at the time. Another word that's commonly used (though not exclusive) to refer to African Americans is "colored")

LuZhang0128 commented 2 years ago

This is very interesting reading. I wonder, however, how well can the model capture the dynamic evolution of culture? Since, as mentioned in the article, the training of the model needs a large dataset. I wonder if I'm just interested in how a specific event is cause a shift in culture, is the performance of the model going to be good enough to conclude anything? If this is not the best approach, is there any other model we can use to analyze this kind of cultural evolution?

chentian418 commented 2 years ago

I kind of treat this paper as a baseline for studying word embedding models and Figure 1 is a very intuitive and straightforward illustration of how the algorithm works! I have the following questions regarding the word embedding models and culture dimensions:

We know in this model a word’s nearest neighbors are often either its synonyms or syntactic variants. By saying nearest neighbor, I suppose you mean the cosine similarity between the two word vector is large, correct?
The idea of constructing cultural dimensions from word vectors is very inspiring for social science study. The paper discuss the reason why a pair of opposite words can represent a culture dimension in a qualitative way, but I was wondering what is the mathematic intuition behind it? Does the difference vector exactly captures the meaningful cultural dimension or is just a projection or it's proportional to the true cultural dimension?
Empirically, is there any evidence of how accurate the difference vector can capture the culture dimension?
Can the constructed cultural dimension be a direct input into social science models like regressions? Then what are the interpretation for this high-dimensional vector there?

Thanks!

mikepackard415 commented 2 years ago

One of my questions relating to word embeddings is whether we can use them to compare two (or more) different corpora? For example, we might compare where certain words fall on the good-bad axis in different word embedding models trained on corpora generated by different communities to empirically study how those communities differ. Is this a reasonable use of these models?

kelseywu99 commented 2 years ago

An interesting read on the construction of cultural dimensions through word embedding. One aspect the paper has brought up is that biases present in determining the dichotomy of words hence stereotypes ensues. What are some steps to resolve or diminish this effect?

NaiyuJ commented 2 years ago

I'm quite curious about how the patterns manifested in the corpus have changed over time. When I use word embeddings in Chinese context, I just found that the patterns or trends we find in the corpus using word embedding can be very different for different years.

YileC928 commented 2 years ago

The paper is really detailed and exemplary. As it suggests that word embedding usually requires large corpora for modeling, I was wondering how large is appropriate. Is there a commonly used threshold?

Emily-fyeh commented 2 years ago

I am curious about how can these ground-breaking measurements apply in a more contemporary scenario, say the past three decades (1990-2020). It seems necessary to involve online content to better represent the panorama of the cultural landscape, as social media breaks the monopoly of content-producing and allows more social groups to make influences. I would like to know how this method can capture the nuances among the (seemingly) flattening social classes online.

hshi420 commented 2 years ago

I am curious how well the finetuned models can work. Dimensions can be very different from cultural group to cultural group. After finetuning the pretrained model on a subculture corpus, can the model be used as an accurate measure of dimension?

chuqingzhao commented 2 years ago

I love this paper! I think it is thought-provoking to link word embedding method with cultural dimensions and capitals. The paper compares the word embedding method and semantic network analysis. Those two methods seem similar because both of them put specific word into context. I wonder in which case or research question semantic network is better than word embedding, and vice versa.

melody1126 commented 2 years ago

The authors discussed one limitation of word embedding is the necessity of a large corpora in order to capture subtle associations between the words. How could we interpret the results of word embedding on a smaller corpora for the final project?

ttsujikawa commented 2 years ago

The paper is highly intuitive and inspiring! I like its part where how word embedding method captures cultural contexts of text data. I was wondering how we should deal with semantic network analysis in a comparison of texts from different cultural settings.

AllisonXiong commented 2 years ago

This paper is really thought-provoking! The authors linked word embedding to multi-dimentional understanding of the social concept of class under the assumption that 'meaning is not immanent within words & phrases but rather coheres within a broader cultural system'. My question is related to their interpretation of their results, especially the cosine relations between dimensions. The authors tried and searched for sociological explanation of their results (which is a kind of abduction I assume), but is it possible that there are also theories supporting opposite findings? To what extent can the word embedding serves as a validation of theories, despite the possible flaws of the model applied?

sudhamshow commented 2 years ago

Interesting paper! A couple of questions - Why does this kind of representation work? Could you please explain it intuitively? (or mathematically). I had a hard time wrapping my head around why this works so well. For example when projecting a word on a single vector (rich - poor) or (male - female) one can understand that the projection of another word lies close to the extreme of the category it is associated with. But, why does adding these different difference vectors still maintain coherence (for all we know different vectors could be orthogonal to each other and the resulting orientation of the sum of these vectors could not tell us anything meaningful).

How does one go about calculating the dot product for the resulting vector and the word of interest (does on use the distributive property of vectors (a-b).c = a.c - b.c)?

I was also wondering if algorithmic bias of these word-vector models could be fixed manually. Since we know how the hyperspace is oriented and what words are close by (due to context from learned data), can't one manually displace a word of interest from its current position to a new one? For example move woman from the proximity of nurse to that of doctor or engineer, thus resolving for stereotypes?

floriatea commented 4 months ago

Really helpful and comprehensive analysis of cultural insights on word embeddings! Despite significant economic transformations over the twentieth century, the study finds that the basic cultural dimensions of class remained remarkably stable. What does this reveal about the relationship between cultural perceptions of class and actual economic conditions, and how might this influence our understanding of social mobility and class dynamics? Also, the paper notes a notable exception in the association of education with affluence, becoming more tightly linked over time. How does this shift affect the societal value placed on education, and what implications does it have for understanding the evolving landscape of social stratification? Does that suggest we should lower the cost of education and make it more affordable fore everyone?

UChicago-Computational-Content-Analysis / Readings-Responses-2023

4. Exploring Semantic Spaces - oritenting #39