caufieldjh / awesome-bioie

🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)
Creative Commons Zero v1.0 Universal
342 stars 33 forks source link

Looking for sources for corpora #1

Closed caufieldjh closed 4 years ago

caufieldjh commented 4 years ago

Would like to add more corpora but don't have stable links for them yet. The first one is MiPACQ: paper - the Multi-source Integrated Platform for Answering Clinical Questions corpus, containing 13,091 sentences from clinical narratives, all annotated for syntactic structure and named entities.

Looks like it should be at http://clear.colorado.edu/compsem/index.php?page=endendsystems&sub=mipacq but that link doesn't appear live at the moment.

caufieldjh commented 4 years ago

SHARPn NLP Seed Corpus - paper - clinical notes from pulmonary arterial disease and breast cancer patients.

That paper may not be quite right.

caufieldjh commented 4 years ago

SHARPn Stratified Corpus - this and the previous set seem like they should be at http://informatics.mayo.edu/sharp/index.php/Tools but I haven't found download links for the corpora.

caufieldjh commented 4 years ago

Would like to add more corpora but don't have stable links for them yet. The first one is MiPACQ: paper - the Multi-source Integrated Platform for Answering Clinical Questions corpus, containing 13,091 sentences from clinical narratives, all annotated for syntactic structure and named entities.

Looks like it should be at http://clear.colorado.edu/compsem/index.php?page=endendsystems&sub=mipacq but that link doesn't appear live at the moment.

The Colorado CLEAR page is accessible but doesn't have an obvious link to the MiPACQ corpus. Looks like usage may still require coordination through Mayo Clinic?

caufieldjh commented 4 years ago

Gave up looking for these - inaccessible data sets are Not Awesome.

drussellmrichie commented 3 years ago

Hi, @caufieldjh. It's a shame that it's not easier to access these. Did you ever get access? I'm also trying to get access to MiPACQ and SHARP (and THYME, too).

caufieldjh commented 3 years ago

Hi @drussellmrichie - not sure about the others, but THYME colon cancer splits are here: https://github.com/stylerw/thymedata

drussellmrichie commented 3 years ago

Not sure if you're still interested in this, but for you or anyone else who comes across this, according to an email that my PI just received from Guergana Savova, who co-leads hNLP:

"the MiPACQ and SHARP corpora are not available for distribution at this point."

😦😦😦😦😦😦😦😦

I'll post here if I here anything else....

caufieldjh commented 3 years ago

Oh well! Thanks for forwarding the official word, even if it's disappointing.