Prerequisites

[x] Put an X between the brackets on this line if you have done all of the following:
- Checked the online documentation: https://mimic.physionet.org/about/mimic/
- Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=

Description

Description of the issue, including:

what you have tried
references to similar issues
queries demonstrating your question (if applicable)

Which attributes can be seen as the pre-admission diagnosis and the discharge diagnosis?

Hi, I am one of the users of the MIMIC-III database. Recently, I am researching the Medical Concept Linking Issue that matches the unstructured text to the concepts in a knowledge base. As far as the MIMIC-III database is concerned, the unstructured text is mean the pre-admission diagnosis and the concept is mean the icd-9 code which is usually the attribute icd9_code(from the 'Diganoses_icd' or Procedures_icd table). But Literature 1 uses the attribute diagnosis(from Admissions table) as pre-admission diagnosis text and the Literature 2 and Literature 3 use the subsection History of Present Illness of attribute text(from Noteevents table) as pre-admission diagnosis text. So I am uncertain that which attributes can be treated as the pre-admission diagnosis text, which refers to the text that has not been diagnosed by the doctor. I am read these issues #632 and #563 and so on. But I am still not sure which text can serve as a pre-admissions diagnosis. Besides, I had found the subsection discharge diagnosis of attribute text(from the Noteevents table) that is also highly related to the icd-9 code. From it literally, the subsection discharge diagnosis text is the patient's discharge diagnosis confirmed by a doctor. So it can not be treated as a part of the pre-admission diagnosis text.

In previous experiments, I had found that the diagnosis text (from Admissions table) is a free and unstructured text that contains many acronyms, abbreviations, and individual characters. It is hard to use the diagnosis text to link to the icd-9 codes of disease or procedure concept by an NLP model. Similarly, I also use the subsection History of Present Illness of attribute text(from Noteevents table) to link to the icd-9 code. Unfortunately, the subsection History of Present Illness contained too many word tokens and many non-medical terms so that the model runs slowly and the performance is not good. Besides, I had found the attribute description(from the Drgcodes table) is the better text, with almost no abbreviations, acronyms, and single letters and shorter lengths.

Thus, my questions or doubts is that 1)It is possible to combine the diagnosis text and description text as the pre-admission diagnosis? or 2)It is possible to combine the diagnosis text, description text and subsection `History of Present Illness' text as the pre-admission diagnosis?

Literature 1 @inproceedings{dai2018fine, title={Fine-grained concept linking using neural networks in healthcare}, author={Dai, Jian and Zhang, Meihui and Chen, Gang and Fan, Ju and Ngiam, Kee Yuan and Ooi, Beng Chin}, booktitle={Proceedings of the 2018 International Conference on Management of Data}, pages={51--66}, year={2018}, organization={ACM} } Literature 2 @article{mullenbach2018explainable, title={Explainable prediction of medical codes from clinical text}, author={Mullenbach, James and Wiegreffe, Sarah and Duke, Jon and Sun, Jimeng and Eisenstein, Jacob}, journal={arXiv preprint arXiv:1802.05695}, year={2018} } Literature 3 @article{li2019icd, title={ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network}, author={Li, Fei and Yu, Hong}, journal={arXiv preprint arXiv:1912.00862}, year={2019} }

In general I would avoid using the diagnosis field in the admissions table. It is written on admission to the hospital, and it does provide a bit of information, but as you have found there is very little consistency in what people enter into it. I have seen patients who had sepsis in the ICU who have an admission diagnosis of "SYNCOPE" - which again while probably true is not very useful from an admission diagnosis point of view.

The DRGCODES table is very similar to the DIAGNOSES_ICD table, and is coded on discharge. It shouldn't be used as an admission diagnosis.

If you want to get some indication of their admission diagnosis, then the closest you will get is from the discharge summary. The "History of present Illness" section probably won't get you an admission diagnosis - instead I'd look for sections like "CHIEF COMPLAINT" or "ADMISSION DIAGNOSIS". You may have to look for other section titles as well!

For your project - if you are using the ICD codes from DIAGNOSES_ICD as the label and the text as the data (as many NLP projects do), then to me it makes more sense to try and identify the discharge diagnosis, rather than the admission diagnosis.

MIT-LCP / mimic-code