clulab / bioresources

Data resources from the biomedical domain
Apache License 2.0
3 stars 1 forks source link

bio_process.tsv.gz should be expanded #1

Closed myedibleenso closed 8 years ago

myedibleenso commented 8 years ago

Problem 1: Misspellings, but not the correct form

"apopotosis", a misspelling of "apoptosis", is included in the BioProcess kb (bio_process.tsv.gz), but "apoptosis" is missing altogether.
See: https://en.wiktionary.org/wiki/apopotosis

I think it's perfectly reasonable to detect common misspellings, but we should also be covering the correct spellings.

Problem 2: missing common terms

We're missing some common terms that we probably want, such as "carcinogenesis", "oncogenesis", and "tumorigenesis". Any idea how we can expand and refine this kb?

hickst commented 8 years ago

I added correct spelling (but left the common misspelling). No new terms added yet. I suggest that you might accumulate new candidate terms in a comment here and we can run them by the biologists later.

MihaiSurdeanu commented 8 years ago

I agree!

On Wed, Feb 17, 2016 at 8:26 PM, Tom Hicks notifications@github.com wrote:

I added correct spelling (but left the common misspelling). No new terms added yet. I suggest that you might accumulate new candidate terms in a comment here and we can run them by the biologists later.

— Reply to this email directly or view it on GitHub https://github.com/clulab/bioresources/issues/1#issuecomment-185525930.

hickst commented 8 years ago

I did some searching and it appears that Gene Ontology considers these terms to be too non-specific to qualify as biological processes. MeSH, however, does have an entry for them, with NS:ID = mesh:D063646

https://www.nlm.nih.gov/cgi/mesh/2016/MB_cgi?mode=&term=Carcinogenesis&field=entry

Guang and Ryan approved these additions, I added them to the bio_process.tsv.gz KB, updated the CHANGES file, reran the NER generator, and checked it all in.