Open HyunkuKwon opened 4 years ago
It is an inspiring paper that provides one approach of how word embedding could be applied. My question is mainly on the group of neutral words. In the paper, neutral word is the critical criteria to decide the bias in the gender and ethnicity. Is it coded by the manpower like from the Amazon Turk, or is there any classification implemented to make selections? Such a selection should be justified, as it is an important part for the paper.
I wonder why, contrary to the orienting reading, the authors choose to go with Euclieadean distance as opposed to cosine similarity as a metric for semantic relation.
Furthermore, the authors mention debiasing as a fruitful research direction. That would be interesting. I have two questions about this: (a) if a debiasing algorithm is built, how does one apply it for the good of society? (b) how can it be done? Words that are close in the embedding space may not necessarily be interchangeable. Opera and golf certainly are not.
I noticed that the researchers trained the embedding model for per decade. My question is, how can we set the standards of comparing the model of each decade? Which index should be paid attention to?I think in this research the standards are kind of flexible.
As @timqzhang, I am also concerned about the neutrality. From my past experience, using the last name to detect ethnic groups can create a bias towards married women. Apart from that, the authors also pointed out that “… the distribution of last names in the United States differs significantly by ethnicity, with the notable exception of White and Black last names”. Nevertheless, the researchers still used White last names in the study (see A.2.). So I wonder how they addressed the potential biases?
In men vs. women adjectives over the decades, we can often see opposite adjectives such as obedient/disobedient, honest/unreliable under the same decade. Does that mean we observe more about the aspects in which we evaluate men/women instead of the general attitude toward men/women?
This is a very inspiring paper. (It's also very interesting to see that in the 1940s & 50s, 'sailor' was actually one of the most 'woman' occupations extracted.) My question also concerns the generation of 'neutral word list' as mentioned by @timqzhang . Did they handpick the list, or did they systematically drew the list based on a certain criterion?
This paper is quite interesting. It sounds magnificent to take a look of the word embedding from a cultural perspective. I am also curious about neutrality and extrapolation. For example, given the fact that the authors distinguished race and ethnicity by using different last names, to what sense it is neutral and how can we extrapolate to other studies?
I am wondering the causality of these historical events. The paper is using these events in the word embedding model to discover changes in gender sterotype, but how much of that shift is causal to these historical events? Do we have to take extra caution in interpretation?
Interesting paper with many results. I am interested in understanding the bias of occupation. As the authors mentioned in the article that they select a list of neural word on the occupation and measure the difference of distance between female dimension and male dimension, and my question, will the measurement of the distance between the whole list of occupations and gender dimension be unable to measure the bias in specific occupations, e.g. doctor and nurse?
I could understand that this kind of measurement could be able to capture the general change of women movement into occupation in the 1960s, but will a divided analysis in different category of occupation better to capture the change specific occupation's bias?
Post questions about the following exemplary reading here:
Nikhil Garg, Londa Schiebinger, Dan Jurafsky and James Zou’s follow-on article. 2018. “Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes.” Proceedings of the National Academy of Sciences 115(16): E3635-E3644