MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.6k stars 1.52k forks source link

Is there an easy way to get symptoms for each admission? #34

Closed wael34218 closed 8 years ago

wael34218 commented 8 years ago

I am trying to extract symptoms from 'noteevents' but it doesn't seem to be a straight forward task. Is there an easy way of populating this information for each admission?

tompollard commented 8 years ago

Hi @wael34218, detecting symptoms from free text is unlikely to be straightforward. Perhaps if you identify some successful approaches in the literature then we can work on some code collaboratively in this repository.

pszolovits commented 8 years ago

There are general purpose NLP tools that could be applied to this problem, including the one underlying the papers by Li-Wei Lehmann and Bill Long, and the cTAKES system. I’m not sure if they are accurate enough to make it worthwhile to run over the entire data set, though we have thought about trying to do this. —Peter Sz.

Lehman, L.-W., Long, W., Saeed, M., & Mark, R. (2014). Latent topic discovery of clinical concepts from hospital discharge summaries of a heterogeneous patient cohort. 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 1773–1776. http://doi.org/10.1109/EMBC.2014.6943952

Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., & Chute, C. G. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association : JAMIA, 17(5), 507–513. http://doi.org/10.1136/jamia.2009.001560

On Nov 25, 2015, at 6:25 PM, Tom Pollard notifications@github.com wrote:

Hi @wael34218, detecting symptoms from free text is unlikely to be straightforward. Perhaps if you identify some successful approaches in the literature then we can work on the code collaboratively in this repository.

— Reply to this email directly or view it on GitHub.

ishrar commented 8 years ago

The UMLS dictionaries ( https://www.nlm.nih.gov/research/umls/ ) include large vocabularies for Symptom names that you can use for this purpose. And, as Pete already pointed out above, there are several ready-made NLP tools that utilize the UMLS dictionaries, and you should be able to use them straight out-of-the-box for this problem (you'll probably only need to extract the 'text' field of the 'noteevents' table and pre-process them into the formats required by these tools). The following are the pointers to these open-source NLP tools that you can try out:

Each of the tools above use their own NLP pipelines for concept name disambiguation. Thus, as the UMLS dictionaries are growing over time, the accuracy of these tools are slowly going down, often failing to identify crucial concepts correctly.