Closed ruisi-su closed 2 years ago
FYI - the issue you linked clearly highlighted this data has questionable labels as a weakly-labeled dataset.
The author of the paper also said:
To conclude, I agree that the GAD and EUADR datasets are weakly supervised (distant supervision) datasets. And since we now have multiple high-quality BioRE datasets, I personally suggest that we need to refrain from using weakly labeled datasets and move to use other datasets such as ChemProt, DrugProt, or other human-labeled datasets for evaluating BioLMs.
Thanks @tmabraham for looking into this. I remember about GAD's labels generating confusions when I was tracking down this dataset. This issue was created to stay consistent with the BLURB dataset. However, I think your point (along with others' concern about this dataset) is very valid. We will discuss and get back to you on this!
Actually @ruisi-su @tmabraham , can we keep this as high priority for implementation? The only valid reason to deprioritize a dataset used in a standard benchmark is if that dataset isn’t public. More generally, as a research question, we’re interested in models trained with labels with different provenance (e.g., weakly supervised) to measure performance tradeoffs. From this perspective, datasets like these are quite valuable.
Adding a Dataset