MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.53k stars 1.51k forks source link

Which attributes can be seen as the pre-admission diagnosis and the discharge diagnosis? #655

Closed YUHANYU closed 4 years ago

YUHANYU commented 4 years ago

Prerequisites

Description

Description of the issue, including:

Which attributes can be seen as the pre-admission diagnosis and the discharge diagnosis?

Hi, I am one of the users of the MIMIC-III database. Recently, I am researching the Medical Concept Linking Issue that matches the unstructured text to the concepts in a knowledge base. As far as the MIMIC-III database is concerned, the unstructured text is mean the pre-admission diagnosis and the concept is mean the icd-9 code which is usually the attribute icd9_code(from the 'Diganoses_icd' or Procedures_icd table). But Literature 1 uses the attribute diagnosis(from Admissions table) as pre-admission diagnosis text and the Literature 2 and Literature 3 use the subsection History of Present Illness of attribute text(from Noteevents table) as pre-admission diagnosis text. So I am uncertain that which attributes can be treated as the pre-admission diagnosis text, which refers to the text that has not been diagnosed by the doctor. I am read these issues #632 and #563 and so on. But I am still not sure which text can serve as a pre-admissions diagnosis. Besides, I had found the subsection discharge diagnosis of attribute text(from the Noteevents table) that is also highly related to the icd-9 code. From it literally, the subsection discharge diagnosis text is the patient's discharge diagnosis confirmed by a doctor. So it can not be treated as a part of the pre-admission diagnosis text.

In previous experiments, I had found that the diagnosis text (from Admissions table) is a free and unstructured text that contains many acronyms, abbreviations, and individual characters. It is hard to use the diagnosis text to link to the icd-9 codes of disease or procedure concept by an NLP model. Similarly, I also use the subsection History of Present Illness of attribute text(from Noteevents table) to link to the icd-9 code. Unfortunately, the subsection History of Present Illness contained too many word tokens and many non-medical terms so that the model runs slowly and the performance is not good. Besides, I had found the attribute description(from the Drgcodes table) is the better text, with almost no abbreviations, acronyms, and single letters and shorter lengths.

Thus, my questions or doubts is that 1)It is possible to combine the diagnosis text and description text as the pre-admission diagnosis? or 2)It is possible to combine the diagnosis text, description text and subsection `History of Present Illness' text as the pre-admission diagnosis?

Literature 1 @inproceedings{dai2018fine, title={Fine-grained concept linking using neural networks in healthcare}, author={Dai, Jian and Zhang, Meihui and Chen, Gang and Fan, Ju and Ngiam, Kee Yuan and Ooi, Beng Chin}, booktitle={Proceedings of the 2018 International Conference on Management of Data}, pages={51--66}, year={2018}, organization={ACM} } Literature 2 @article{mullenbach2018explainable, title={Explainable prediction of medical codes from clinical text}, author={Mullenbach, James and Wiegreffe, Sarah and Duke, Jon and Sun, Jimeng and Eisenstein, Jacob}, journal={arXiv preprint arXiv:1802.05695}, year={2018} } Literature 3 @article{li2019icd, title={ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network}, author={Li, Fei and Yu, Hong}, journal={arXiv preprint arXiv:1912.00862}, year={2019} }

alistairewj commented 4 years ago

In general I would avoid using the diagnosis field in the admissions table. It is written on admission to the hospital, and it does provide a bit of information, but as you have found there is very little consistency in what people enter into it. I have seen patients who had sepsis in the ICU who have an admission diagnosis of "SYNCOPE" - which again while probably true is not very useful from an admission diagnosis point of view.

The DRGCODES table is very similar to the DIAGNOSES_ICD table, and is coded on discharge. It shouldn't be used as an admission diagnosis.

If you want to get some indication of their admission diagnosis, then the closest you will get is from the discharge summary. The "History of present Illness" section probably won't get you an admission diagnosis - instead I'd look for sections like "CHIEF COMPLAINT" or "ADMISSION DIAGNOSIS". You may have to look for other section titles as well!

For your project - if you are using the ICD codes from DIAGNOSES_ICD as the label and the text as the data (as many NLP projects do), then to me it makes more sense to try and identify the discharge diagnosis, rather than the admission diagnosis.