clulab / bioresources

Data resources from the biomedical domain
Apache License 2.0
3 stars 1 forks source link

Add MeSH diseases #34

Closed bgyori closed 4 years ago

bgyori commented 4 years ago

This PR adds a script to generate a mesh-disease.tsv file with all the names and synonyms of the entries in the MeSH disease subtree. I introduced a new entity type called Disease for these. The corresponding changes and testing in Reach is still in progress so I suggest waiting with the merge before those are sorted out.

bgyori commented 4 years ago

@MihaiSurdeanu I'm happy with this now and all Reach tests are passing as far as I can tell.

MihaiSurdeanu commented 4 years ago

Thanks!

It seems some files have been changed in this PR that should not have been. For example, BioProcess, Species, and TissueType have been modified (did not do a diff yet). Do you know why?

bgyori commented 4 years ago

I made some actual changes in BioProcess to remove some terms that were tagged as biological processes but are actually diseases from MeSH (since these are now included in the new mesh-diseases file). The other files haven't actually changed, it's just that ner_kb.sh regenerates all the files by default, even if they haven't meaningfully changed.

MihaiSurdeanu commented 4 years ago

Cool. Thanks!