bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
447 stars 114 forks source link

Closes #843 #844

Closed mariosaenger closed 1 year ago

mariosaenger commented 1 year ago

This PR adds the implementation of the CPI corpus (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0220925) to the project.

Checkbox