bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
456 stars 116 forks source link

Create dataset loader for GENIA Coreference Corpus #24

Open hakunanatasha opened 2 years ago

hakunanatasha commented 2 years ago

From http://www.geniaproject.org/genia-corpus/coreference

sugatoray commented 2 years ago

self-assign

hakunanatasha commented 2 years ago

@sugatoray, can you let us know if you are still working on this so we can update our project board? Please just notify us the status by Friday April 8, no worries if you are not finished but intend to work on this. Please either @hakunanatasha or ping the discord admins (with @admins)

sugatoray commented 2 years ago

@hakunanatasha Thank you, for getting in touch with me. Yes, I am working on it. I will keep you posted on the progress here.

hakunanatasha commented 2 years ago

great thank you! Genia has a unique licensing case - please report the license as:

Annotations created by the GENIA Project are licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.

davidkartchner commented 2 years ago

self-assign

shamikbose commented 2 years ago

@ruisi-su @galtay This is tagged as HIGH. If nobody is working on it, I can take a look at it this week