braindatalab / gecobench

NLP Benchmark for XAI methods
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Incomplete gender perturbations in the subject dataset #84

Open ege-erdogan opened 1 month ago

ege-erdogan commented 1 month ago

Hi, and thanks for the nice work. We've discovered some samples in the subject dataset (downloaded from OSF) for which the subject gender cannot be clearly distinguished between the male and female versions of the same sentence, and the ground truth labels do not cover all the words corresponding to the subject. Two examples (bold words are part of the subject but stay the same):

(MALE) Because his father works with horses , Matilda demands the definition of a horse . (FEMALE) Because her father works with horses , Matilda demands the definition of a horse .

and

(MALE) Zain seeks escape in an ultimate manner by committing suicide , drowning herself in the waters of the Gulf of Mexico. (FEMALE) Chloe seeks escape in an ultimate manner by committing suicide , drowning herself in the waters of the Gulf of Mexico.

Appears to be human labeling error according to A.1.1 in the paper but we wanted to notify you and see if you were aware of this or updated the dataset to fix this issue.

Edit: to clarify, in the second examples 'herself' is not part of the grammatical subject but refers to the subject so should be modified accordingly to be consistent, while in the first sentence 'Matilda' is the subject.

Best, Ege

rickwg commented 1 month ago

Hey Ege, thanks for bringing that to our attention - great catch! We definitely could've been clearer about how we put together the 'subject' dataset. Let me break it down: For the 'subject' dataset, we're only labeling the first part of the grammatical subject. If there's a second part, we're leaving it out. As for those sentences you pointed out, we're altering the bold words specifically for the 'all' dataset. Just so you know, we're actually in the process of updating our datasets. We've realized that using names to determine gender is a critical weakness in our current setup, so we're working on fixing that. I'll make sure to close this issue once we've got the updated versions published and ready to go. Really appreciate you flagging this. If you have any other questions or spot anything else, feel free to let me know.