bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
447 stars 114 forks source link

Create dataset loader for GGPONC2 #863

Closed nachollorca closed 1 year ago

nachollorca commented 1 year ago

Adding a Dataset

nachollorca commented 1 year ago

Regarding "_[ ] Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema_" from the PR:

the original .json for this dataset is already structured as the kb_features schema, i.e., source == bigbio_kb. How should this be handled? Is it alright in this case to submit only bigbio schema?