MIT-LCP / eicu-code

Code and website related to the eICU Collaborative Research Database
https://eicu-crd.mit.edu
MIT License
307 stars 212 forks source link

Issue with Misclassification of diagnosisstrings? #239

Open vanessailana opened 6 months ago

vanessailana commented 6 months ago

Hello,

I was working with this dataset and noticed that some codes appear to be misclassified.

For example, the diagnosis string "cardiovascular chest pain / ASHD coronary artery disease / other biological bypass graft" is assigned to I25.810. This code represents "Atherosclerosis of coronary artery bypass graft(s) without angina pectoris"

However, I am wondering if I25.73, which seems similar, is actually more appropriate, as the definition of this code is "Atherosclerosis of nonautologous biological coronary artery bypass graft(s) with angina pectoris"

Could there be an issue with misclassification?

obadawi commented 6 months ago

One thing to keep in mind is that the diagnosis strings are not ICD 9 or 10 diagnoses. They were custom made and the codes you see were generated by a mapping of the custom strings to their ICD counterparts. It's possible there are errors or gray areas in the mappings but I believe they are fairly accurate.

On Wed, Feb 14, 2024, 17:10 vanessailana @.***> wrote:

Hello,

I was working with this dataset, and it appears that some codes are misclassified.

For example, the code for "cardiovascular | chest pain / ASHD | coronary artery disease | of other biological bypass graft" is given as I25.810. This code represents "Atherosclerosis of nonautologous biological coronary artery bypass graft(s) with angina pectoris."

However, I wonder if I25.73 is also similar to I25.810, "Atherosclerosis of nonautologous biological coronary artery bypass graft(s) with angina pectoris."

Could there be a problem with misclassification?

— Reply to this email directly, view it on GitHub https://github.com/MIT-LCP/eicu-code/issues/239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6USVFDEF5KNRQQ2AFRSU3YTUY7FAVCNFSM6AAAAABDJGOPQOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZTKMRYGQZTKMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

vanessailana commented 6 months ago

How are these mappings generated to the custom strings ? Is there a paper that I can refer to to understand how these mappings were derived? I was looking at https://github.com/MIT-LCP/eicu-code/blob/main/notebooks/diagnosis.ipynb. This repo it says that diagnosisstring is the problem documented.

crista commented 6 months ago

One thing to keep in mind is that the diagnosis strings are not ICD 9 or 10 diagnoses. They were custom made and the codes you see were generated by a mapping of the custom strings to their ICD counterparts. It's possible there are errors or gray areas in the mappings but I believe they are fairly accurate.

Hello. I'm @vanessailana's advisor. Where can we find information about this mapping? Even if there are no papers/documents, is there a Python script somewhere that did it? Or was it a manual mapping? Thanks.

obadawi commented 6 months ago

This would have been part of the original software system used in clinical care. So there won't be any scripts or mapping that would be available to my knowledge.

On Thu, Feb 15, 2024, 17:34 crista @.***> wrote:

One thing to keep in mind is that the diagnosis strings are not ICD 9 or 10 diagnoses. They were custom made and the codes you see were generated by a mapping of the custom strings to their ICD counterparts. It's possible there are errors or gray areas in the mappings but I believe they are fairly accurate.

Hello. I'm @vanessailana https://github.com/vanessailana's advisor. Where can we find information about this mapping? Even if there are no papers/documents, is there a Python script somewhere that did it? Or was it a manual mapping? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/MIT-LCP/eicu-code/issues/239#issuecomment-1947449237, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6USVDDP45J6E6F4ECZIIDYT2ERBAVCNFSM6AAAAABDJGOPQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBXGQ2DSMRTG4 . You are receiving this because you commented.Message ID: @.***>

crista commented 6 months ago

Thanks! To make sure I understand: 1) each hospital did it on their own? 2) were the ICD codes already part of the hospital data when you started this dataset and you just used them as they were, or did you do further processing of the codes in constructing this dataset?

obadawi commented 6 months ago

My understanding is this was part of the Philips software used to manage patients (eCareManager). This same software was used by all the eICU systems across all hospitals. The research dataset was created after the fact and is a deidentified version of the clinical data. Philips would have had the mapping within the software so the ICD codes were already present as is in the research database.

On Thu, Feb 15, 2024, 18:28 crista @.***> wrote:

Thanks! To make sure I understand:

  1. each hospital did it on their own?
  2. were the ICD codes already part of the hospital data when you started this dataset and you just used them as they were, or did you do further processing of the codes in constructing this dataset?

— Reply to this email directly, view it on GitHub https://github.com/MIT-LCP/eicu-code/issues/239#issuecomment-1947504716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6USVGTSHMHURVRVUTOZHLYT2KYZAVCNFSM6AAAAABDJGOPQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBXGUYDINZRGY . You are receiving this because you commented.Message ID: @.***>

crista commented 6 months ago

Thank you for the clarifications!