bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
447 stars 113 forks source link

Create dataset loader for GEOKhoj v1 #327

Closed s-desh closed 2 years ago

s-desh commented 2 years ago

Adding a Dataset

hakunanatasha commented 2 years ago

@s-desh this is pretty cool - if you feel comfortable implementing it, please go ahead!

hakunanatasha commented 2 years ago

It also looks like this is the source paper: https://www.nature.com/articles/ncomms12846

s-desh commented 2 years ago

@hakunanatasha Sure. Just fyi, the above mentioned paper was used as a reference and is not the source paper. We at Elucidata were very inspired by this paper and extended its work by curating 30k samples.

s-desh commented 2 years ago

self-assign

hakunanatasha commented 2 years ago

Closed with #393