bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
447 stars 114 forks source link

Closes #843 #852

Closed mariosaenger closed 1 year ago

mariosaenger commented 1 year ago

This PR adds the implementation of the CPI corpus (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0220925) to the project.

Checkbox

galtay commented 1 year ago

:tada:

https://huggingface.co/datasets/bigbio/cpi