Exploring Semantic Spaces - Orientation

HyunkuKwon commented 4 years ago

Post questions about the following orienting reading:

Kozlowski, Austin, Matt Taddy, James Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 85(5):905-949.

wanitchayap commented 4 years ago

This is such an interesting paper! Below are my 2 questions:

Word sense - Word embedding model normally treats words with the same spelling as the same word. This could be problematic when the word we are interested in has many word senses. For example, rock can mean geological rock, rock music, or rocking a chair. The authors also admitted this problem interfering with the race dimension they measured (black-white can refer to the general color that isn't related to race). Is it possible to differentiate such words with many senses in the embedding model by some unsupervised methods? If I understand correctly, I think BERT can deal with such problem. Could the authors have used a pre-trained embedding model from BERT to conduct the study?
Beyond the dimension of antonym - What if we want to measure the cultural dimension of gender but beyond just masculinity-femininity? What if we want to measure a gender dimension that is constructed from several sexual identities available in the modern day (e.g. lesbian, gay, asexual, bisexual, pansexual)? I don't think these different identities can be captured by the masculinity-femininity spectrum only. It is also unclear how could we construct a separate antonym dimension for these identities (e.g. homo-heterosexual dimension will not be enough to capture asexual, bisexual, and pansexual). Does this mean that an antonym dimension is not sufficient? Could we construct a dimension that is more of a plane than a line? How could we quantify and validate such plane?

ihsiehchi commented 4 years ago

I also think this is one of the most interesting papers from this course. I certainly think word embedding is a slick idea; it constructs a meaningful notion of distance that can show the closeness of two words that may not appear together very often, e.g. opera and golf. I have one footnote I hope James can elaborate on, as well as three applied questions.

Footnote 22 (the quixotic feat of theory-free synonym-pair selection): This was my prior for how one should approach the research question before I reached this footnote. Could you explain why it did not work?

Application to proper nouns: Let's say I want to identify the political figures that are most strongly associated with populism, I would like to train the embedding model on names such as Donald Trump and Joe Biden with antonym-pairs such as "elite-people". Would that work?

Who is the modern-day Mozart? Wolfgang Amadeus Mozart was the pop star of his time, so who is someone we would talk about the way people talked about Mozart in his time? A huge limitation is that many words become irrelevant as the choice of words constantly updates and can vary a lot from one generation to another. That much is already evident in the paper where the time window spans a century. One can only imagine the extent to be larger when it spans across three centuries. The way around this I was thinking is to first identify words that are used both in Mozart's time and well as our time, then we train a word embedding model to find the 18th-century equivalence of the antonym-pairs we are interested in. Finally, we find the vector position of "Mozart" using the 18th-century antonym pairs, and train the model on today's artists using the 21th-century anonym pairs to find the artists that are closest (in terms of cosine similarity) to the "Mozart" vector.

Testing Spence's "job market signaling" with word embedding: The premise of Michael Spence's Nobel-winning idea, job market signaling, is that to some extent, people undergo costly activities such as higher education not because the skills are necessary for employment, but rather they can signal their "type" to their potential employees. "Type" mostly refers to ability, but one can see how affluence of family, status, and cultivation all play a role when it comes to employment. I am just considering how to test this idea by training the word embedding model on a pile of resumes. For a given resume, we will first obtain a set of word vectors -> what do we do with those vectors? Do we average their positions? Do we look for outliers? Does the order of the vectors matter, i.e. set vs sequence? etc etc

timqzhang commented 4 years ago

I really enjoy this paper, and it gives me much insight to the assignment for this week (especially the projection section) and my ongoing project. My concerns are mainly on the word-pair selection for the dimensions:

There is a figure showing the change of projection for words given certain dimension, namely Figure 10 at pp.928 for the employment dimension. It drives me to think about the potential change of word pairs used in dimensions over time. For example, it is possible that some word-pairs that belong to one dimension just show up at the middle of time period, but not at the start; or it is possible that some descriptive word pairs no longer exist recently. Such cases will cause a kind of systemic shift, as they are the criteria to do the following projections. Therefore, I wonder if we should take care of this potential shift and set a control on it before we proceed the projections and the analysis on the changes of projections over time?
Another concern for the word-pair is that I wonder if the POS of the words (the same word in different forms of POS, i.e. liberal, liberate and liberalism) matters when choosing the word pairs for the dimensions? Should we put as many as possible the word pairs (with all potential forms) into the dimension, or we should consider the different effect of the word pairs with different POS? During the assignment, I think the word-pair selection is to some degrees limited to the corpus itself, as for some words, I only have the norm form, but no adjective form, so it is still an issue to consider.

nwrim commented 4 years ago

I enjoyed reading the paper a lot! It really showed me how word embedding models could be a very powerful tool in social science.

My question is similar to Ian's first - while this paper showed that word embedding models could validate a well-established dimension that was studied a lot (both qualitatively and quantitatively) in the field, but I am not sure if these models can help formulate a novel theory (or dimension). To quote from the article,

cultural dimensions for analysis should be motivated by theoretical considerations, as we ultimately did here with class, rather than emergent and sometimes arbitrary qualities of the embedding space. (p. 930)

and

After feeling like the crew from Douglas Adams’s The Hitchhiker’s Guide to the Galaxy when they find that the “Answer to the Ultimate Question of Life, the Universe, and Everything” computed by the supercomputer Deep Thought over 7.5 million years is “42” (Adams, Brett, and Perkings 1978), we caution against “theory-free” approaches to meaning discovery. (p.944)

(ok, I admit that I quoted the latter quote just because I laughed so hard that I saw one of my favorite books being quoted in a top-tier journal)

Building from this, how can we try to form theories using quite uninterpretable results from word embedding models? I guess this is actually quite broad and could be applied to all deep neural networks, too.

WMhYang commented 4 years ago

This paper is impressive to me and I really find Figure 1 helpful when I try to understand word2vec model. In this paper, it is argued that the word position in the embedding model has something to do with conscious associations so that the model could be used to interrogate the undergoing cultural patterns. However, I don't think a reliable and representative survey is easy or even possible in some certain context. For example, if we would like to examine the linguistic change in physical related papers as in the tutorial of this week's homework, it is hard to ask people without certain background knowledge to do the survey. In this case, is there any other methods that we could employ to test whether the bias is conscious or not?

Yilun0221 commented 4 years ago

I think this paper is a very vivid example of applying word embedding to solving social science problems. What makes this research more inspiring is that this research also compared the different word embeddings from a historical perspective. My question also focus on word embeddings for historical text. With time going by, the languages will develop, like containing more new words, and in different times the writing style may be different. I wonder how can a research avoid errors from these factors?

linghui-wu commented 4 years ago

This paper is very though-provoking relating to word embeddings. My question is about the dimension reduction.

First, on page 6, the paper says: “From efficiency considerations, SVD placed strict upper limits on the number of documents and lower limits on the size of semantic contexts they could factorize.” I don’t quite get it why setting the lower bound would be beneficial in improving efficiency?

Second, considering the interpretability of SVD, is it possible that we apply word2vec to obtain the same or similar desired results and explain that as any two close word vectors represent similar semantic meaning? If so, then what's the advantage that SVD possesses over word2vec?

harryx113 commented 4 years ago

From figure 7, we see a drastically increasing similarity between education and affluence. What can we conclude about such trend? Does that mean education is getting more costly over the years or just simply because more text on the topic of money and education has been included in the corpus?

tianyueniu commented 4 years ago

In order for word embedding to extract context from the corpus, the researcher needs to determine a k to quantify 'context'. For example, it was stated in the article that 'previous studies finding windows of ~8 words producing most consistent results'. Should we choose different k for different types of corpus? What is a good method to find the optimal k?

Lesopil commented 4 years ago

On page 906 the author's write: " Previous work with word embeddings in computational linguistics shows that words frequently sharing contexts, and thus located nearby in the vector space, tend to share similar meanings." I was wondering if we could get some references to this work. Later in the paper the authors "empirically validate word embeddings’ ability to capture widely shared cultural associations," but I was wondering if we could get some of that previous work.

minminfly68 commented 4 years ago

It is super fascinating to apply the word-embedding technique for social science problem. I am wondering how might we deal with the problem that the meanings of words might change over different time, and how can we update the corpus and overcome this problem?

jsgenan commented 4 years ago

What is the criteria for the choice of dimensions (words like rich vs poor), do you find the results robust across different combinations?
In word2vec, how can we think of the existing bias of corpus in terms of computational bias?

shiyipeng70 commented 4 years ago

I am wondering that words with multiple meanings are conflated into a single representation (a single vector in the semantic space), polysemy and homonymy may not be handled properly. How do we deal with such circumstances?

timhannifan commented 4 years ago

This paper is a fascinating example of using word embedding for sociological inquiry. The distinguishing factor of the authors' approach is the use of high-dimensional space to measure the proximity of multiple cultural dimensions, and their use of NN over SVD.

One limitation of the study, in my naive view, is the constraint on cultural dimensions: the authors used the dimensions identified by Jenkins' 1958 survey work with college students. Whereas topic modeling is deterministic in the number of topics, the high-dimension approach is deterministic in selecting the vectors of cultural dimension. Is there a way to "discover" these dimensions without pre-specifying them or being constrained to our limited perception of cultural vectors?

pdiazm commented 4 years ago

Love the paper! One question I have is whether there is a use case for word embeddings in which we can track the use of different words across time - that represent the same thing. Or similarly, how the use of a certain word morphs into a whole new different word that's specific to the context - and in this way track the path of an idea / theory / belief / conspiracy theory? Similar to how we can use topic models to track the ideas across time

lyl010 commented 4 years ago

The paper introduces the meaning and empirical experiment results of dimension in culture. My question is, how can we interpret the meaning of dimension in culture?

timhannifan commented 4 years ago

The Karpathy approach, while able to provided some amusing results, seems limited by the constraint of sequential ordering. In this case, the sequence was defined was at the character level. This size of this constraint seems onerous for more complex analysis. Could this sequence be larger, like the word, n-gram or sentence level?

Computational-Content-Analysis-2020 / Readings-Responses-Spring

Exploring Semantic Spaces - Orientation #27