Open JunsolKim opened 2 years ago
Very interesting read!
I like how the author's creative use of adversarial training reduces the influence of confounding variables! But it is clear that 'influential words from the training data also includes some correlative terms, like names of states, that we would expect the latent confound demotion to de-emphasize.' With this limitation, how should we take on the authors' finding? When is the reduction of confounding 'enough'? How can it be done better?
Adversarial training can help reduce the influence of confounding variables, but I'm still not fully clear about how we should interpret the evaluation of its performance on controlling confounds (i.e. like @isaduan asked, how good is "enough"?). Also, the authors said "our assumption that human judgements are not reliable for this task makes evaluation difficult" - what evaluation metrics can be used then in the place of human judgement?
This is an interesting research! However, I'm wondering what is the definition of gender bias? In the end of section8, the authors say that 'biased comments are harmful to the recipient, regardless of who wrote them.' It seems that they're referring to negative bias. Can some 'biases' be positive? I'm wondering can this method work for identifying vicious gender biases from those positively gender-related comments?
The authors emphasize how biases from propensity matching are controlled. Aside from how good this control is, I wonder if there's any other sources of biases in the research design?
This research is interesting, but I am interested in their mention of the limitation wherein there is a chance that in some cases that the comments are not addressing the figure-of-interest but other commenters. To what extent could this be the case and how would one attempt to control for this sort of problem?
Post questions here for this week's exemplary readings: 1. Field, Anjalie and Yulia Tsvetkov. “Unsupervised Discovery of Implicit Gender Bias”. arXiv.org preprint: 2004.08361.