allenai / natural-instructions

Expanding natural instructions
https://instructions.apps.allenai.org/
Apache License 2.0
960 stars 190 forks source link

Data leakage with tasks 1295 and 1640 #773

Open axelmarmet opened 2 years ago

axelmarmet commented 2 years ago

Task 1295 is a training task issued from the AdversarialQA dataset and Task 1640 is a testing task also issued from AdversarialQA. I would assume that this wasn't caught by automated checks because the URLs are different and there is a typo in the Source field of task 1640 where the value is "adverserial_qa"

danyaljj commented 2 years ago

@axelmarmet Thanks for reporting this!

@yeganehkordi would you be able to address this?

yeganehkordi commented 2 years ago

@yeganehkordi would you be able to address this?

Sure, task1295 is a "Question Answering" task and task1640 is an "Answerability Classification" task, so they are not exactly the same. I think one solution may be using different instances in these tasks. Also, we can replace one of them with another task. What do you think?

danyaljj commented 2 years ago

@yizhongw suggests moving this task from splits/default/train_tasks.txt to splits/default/excluded_tasks.txt. I think this should address the concern; right?

yeganehkordi commented 2 years ago

Yeah, I'll fix it.