bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
439 stars 111 forks source link

Closes #900 (version 2) #904

Closed GullyBurns closed 7 months ago

GullyBurns commented 8 months ago

Name: CZI Disease Research State Model Data: https://github.com/chanzuckerberg/DRSM-corpus/ License: CC0

All data processing elements for this dataset are completed. This PR makes some edits for the README file.

Apologies for the last attempt to do this that introduced some errors.