Open hakunanatasha opened 2 years ago
Hi @uzaymacar, can you let us know if you are still working on this so we can update our project board? Please just notify us the status by Friday April 8, no worries if you are not finished but intend to work on this. Please either ping me here at @hakunanatasha or ping the discord admins (with @admins)
Hey @hakunanatasha, yes I am still working on this! I am planning to follow up with a PR by mid-next week.
@uzaymacar awesome! Feel free to ping me here, via your PR, or on the discord for help! I'm looking forward to your submission :cherry_blossom:
@jason-fries There's multiple versions of this. I'm using 5.0.0, which is the latest one
SGTM -- just make certain the versioning is reflected in the data loader metadata.
Hi @jason-fries @galtay @ruisi-su I think I'm starting to understand the CRAFT dataset. I have a few questions:
From what I can understand, this dataset support Tasks.COREF
and Tasks.NER
. Please let me know if there are other tasks it supports
Corefs are somewhat tricky. There are multiple annotations of the same thing. How should that be handled? Here's an example:
<annotation annotator="Annotator" id="1" type="identity">
<class id="IDENTITY chain" label="IDENTITY chain"/>
<span end="71" id="11532192-2" start="65">strain</span>
</annotation>
<annotation annotator="CCP Colorado Computational Pharmacology, UC Denver" id="11532192SHM_Instance_150000" type="identity">
<class id="Noun Phrase" label="Noun Phrase"/>
<span end="71" id="11532192-3" start="65">strain</span>
</annotation>
The NER seems to be pretty straightforward, but just to clarify, the covered types are as follows:
There's also structural annotations, but I'm not sure which task that would solve in the bigbio schema. Does this need to be implemented?
@ruisi-su This is implemented as a local dataset in #681 since download_and_extract()
doesn't seem to work properly with the archive containing the dataset
@shamikbose Are you still working on that?
@mariosaenger This is already implemented as a local dataset in #681 It's awaiting review
Colorado Richly Annotated Full-Text (CRAFT) Corpus
https://github.com/UCDenver-ccp/CRAFT