bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
447 stars 114 forks source link

[WIP] examples of creating meta dataset and training a custom tokenizer #849

Closed galtay closed 1 year ago

galtay commented 1 year ago