Exploring Semantic Spaces (E1) - Caliskan, Bryson & Narayanan 2017

HyunkuKwon commented 3 years ago

Post questions about the following exemplary reading here:

Caliskan, Aylin, Joanna J. Bryson, Arvind Narayanan. 2017. “Semantics derived automatically from language corpora contain human-like biases.” Science 356(6334):183-186.

sabinahartnett commented 3 years ago

My instinctual question in response to this reading and the linguistic biases (created in humans) that computers reflect is how to mitigate this and create 'fair' and 'conscientious' computers (of course this induces a new type of bias)... For the sake of devil's advocate I would also be interested in the ways in which introducing this purposeful bias (to mediate harmful human biases picked up on by machines) is also potentially harmful... if these machines are a 'reflection' of people than there is opportunity to recognize these biases as they are continued (note: this causes further harm so I am not advocating for this necessarily just wondering) but having a machine intervene before that is possible may cause people to believe that humans hold different biases and could thus induce changes through perceived social pressure/norms that are actually fabricated in order to mitigate existing biases.

In short, who decides what is 'fair' and how to mitigate biases that EXIST in humans and are thus recreated in machines and what does it mean for society for those to be mediated?

yushiouwillylin commented 3 years ago

My question about is quite similar to @sabinahartnett 's. It seems like we can identify some cultural shifts or biases from word embeddings. However, identifying them and actually doing something based on AI or machine learning methods are totally different questions. How do you view the increasing usage of these computational methods in everyday life? Do you think we should advance the usage, or stop a bit and try to keep them controlled under some sort of legislation process?

jacyanthis commented 3 years ago

I'm not sure why we would ever expect out-of-the-box word embeddings to be unbiased. They are models of text, so if they do not include the bias of that text, they would be failed models. As @sabinahartnett points out, the reduction of bias thus requires a subjective decision about which tendencies of human-generated text we want to describe as 'bias' rather than just 'associations.' That depends on context: In research, we often want as accurate a model as possible, so debiasing can be counterproductive because it hides the features we want to examine.

So in what specific contexts do word embeddings need to be debiased, and in those contexts, are there relatively objective methods for choosing which associations (e.g. race, class, gender, ability, history, culture) to remove from the semantic space?

RobertoBarrosoLuque commented 3 years ago

This is very interesting and important work. What set of questions should we as researchers/analysts ask when using pre-trained NLP models (such as GloVe, spaCy's or NLTK's pre trained) in order to ensure that we are not inadvertently causing harm through the bias contained in these models?

xxicheng commented 3 years ago

This article reminds me of some statements in this week's orienting reading. It mentions that the existing stereotypes in word embedding prove their accuracy in reflecting the real world. It sounds like only computer scientists care about debiasing word embedding, while social scientists have no interest in it. Is it true? If so, should we worry that biases and stereotypes in word embeddings may mislead social science research results in some cases?

jinfei1125 commented 3 years ago

After reading this paper, I am wondering--is there any future work that can be done to de-bias these judgments of AI? So AIs can make neutral decisions and get rid of bias. However, I think discrimination is a very complex phenomenon and more work is needed to be done to maintain fairness instead of just using AIs to correct it. Is there anything especially needed to be noticed in this debiasing process?

Rui-echo-Pan commented 3 years ago

It's interesting to see how word embedding is used for validating the stereotype biases from culture in the machine learning. In the word embedding analysis, I notice it mentions for example: "female names are more associated with family than career words, compared with male names". As we know, there are two kinds of discrimination/bias: statistical discrimination/bias, and perceived discrimination/bias. Word embedding could be used for analyzing such word-context relation, could that be useful in detecting which bias it is in machine learning based on texutal content?

ming-cui commented 3 years ago

This paper lastly mentioned the potential issues when applying machine learning technologies for résumé screening. This link introduces some real biases or errors in using AI for job interviews. It seems that machine learning learns biases. Will this feature of ML be a major concern when we use it in academia and industry?

Bin-ary-Li commented 3 years ago

This may again show that current ML techniques for extracting semantic embedding are operating on a very shallow level and producing very corpora-dependent results. Can we infer that human bias is a structural construct in the language?

william-wei-zhu commented 3 years ago

As the (online) social world is becoming increasingly aware of the importance of cultural, racial, and gender equity, will computational models trained from text posted recently (last 2 years) have less bias than models trained from images and text from 5 or 10 years ago?

hesongrun commented 3 years ago

This paper is truly inspiring. I am wondering if we may develop some algorithm to counter-act such bias in language? For example, for hiring, the algorithm may detect certain language pattern related to racial issues or gender equality that people may have some bias. The algo can automatically correct such bias by rephrase the text.

egemenpamukcu commented 3 years ago

This was an interesting article. It seems to me that a key value of this paper is that it validates the human biases by replicating them algorithmically. Would it be possible or helpful to tweak the algorithms in a way to deliberately reduce the bias in the algorithms?

UChicago-CCA-2021 / Readings-Responses

Exploring Semantic Spaces (E1) - Caliskan, Bryson & Narayanan 2017 #29