issues
search
bigscience-workshop
/
biomedical
Tools for curating biomedical training data for large-scale language modeling
447
stars
114
forks
source link
[WIP] examples of creating meta dataset and training a custom tokenizer
#849
Closed
galtay
closed
1 year ago
galtay
commented
1 year ago
shows how to create a metadata set by combing lots of BigBio datasets
shows how to efficiently train a custom tokenizer