bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
447 stars 114 forks source link

remove Path.exists from bigbio to support streaming #846

Closed galtay closed 1 year ago

galtay commented 1 year ago

makes the changes from @albertvillanova 's PR https://github.com/bigscience-workshop/biomedical/pull/754/files

tl;dr this fixes a problem in bigbiohub.parse_brat_file that prevented using it in streaming mode. see

for more information. this new version of bigbiohub.py has been uploaded to all hub datasets.