bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
458 stars 116 forks source link

Create dataset loader for DECA #116

Open leonweber opened 2 years ago

leonweber commented 2 years ago

Task: NER/NEN, but no NEN for genes and no NER for species, so no canonical NER/NEN License: custom (included in files) Format: custom Citation: @Article{Wang2010, author = {Wang, Xinglong and Tsujii, Jun'ichi and Ananiadou, Sophia}, journal = {Bioinformatics}, title = {Disambiguating the species of biomedical named entities using natural language parsers}, year = {2010}, month = mar, number = {5}, pages = {661--667}, volume = {26}, language = {en}, publisher = {academic.oup.com}, } Source: http://www.nactem.ac.uk/deca/species_corpus_0.2.tar.gz

uzaymacar commented 2 years ago

self-assign

hakunanatasha commented 2 years ago

Hi @uzaymacar, can you let us know if you are still working on this so we can update our project board? Please just notify us the status by Friday April 8, no worries if you are not finished but intend to work on this. Please either ping me here at @hakunanatasha or ping the discord admins (with @admins)

uzaymacar commented 2 years ago

Hey @hakunanatasha, yes I am still working on this! I am planning to follow up with a PR by mid-next week.

jason-fries commented 2 years ago

Hi @uzaymacar Just a ping on the status of this dataset. Please let us know if you are still working on it and when you plan to submit a PR. Thanks!!