MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.53k stars 1.51k forks source link

Spurious lab values, etc #749

Closed Mauvila closed 4 years ago

Mauvila commented 4 years ago

Regarding Mimic III lab events, are these taken directly from the Lab Information System software or the EHR? The reason I ask is that many of the lab values I have run across recently are clearly spurious (probably due to IVF dilution). Many hospitals will not report values that are obviously the result of contamination (by fairly simple algorithms), instead reporting them as "INVALID" or something similar in the EHR. Is it possible that some of the lab values in MIMIC III were similarly flagged by the LIS, but that this fact didn't make it into the database?

For example, pt ID 852, the labs drawn at 11/02/60 18:31, the Na is 84 (not compatible with life). The glucose is 1601. I feel most systems would have flagged all the labs from this lab draw, and not reported the actual values. For MIMIC data analysis, it is not terribly difficult to implement the same algorithms that LIS systems use to flag these, but if this is needed, it would be useful to come up with a shared GitHub project to do it so that everyone is not re-inventing the wheel.

Also, probably not related to previously mentioned issue, I noticed that pt ID 8565 has a K value on 12/04/98 17:00 listed as "LESS THAN 10 mEq/L". I would hope so, as serum potassium values >= 10 probably fatal. Was this in the original database, or was this an artifact of the conversion to MIMIC 3 data?

Prerequisites

Description

Description of the issue, including:

alistairewj commented 4 years ago

I'm not entirely sure whether the lab database is a separate entity or not from our source.

The source hospital retains these labs in the patient record, but usually annotates them if something is suspected. For example, in the sodium value of 84, the comment is: "SUSPECT CONTAMINATION;REQUEST REDRAW;REPORTED TO ". We would really like to include these comments with the labs, but to date we haven't de-identified them. It's pretty tricky to de-identify really short snippets of text like this, but it is on our list of TODOs, and we've noted there are some easy wins (e.g. the "REPORTED TO" phrasing is fairly consistent).

Also, probably not related to previously mentioned issue, I noticed that pt ID 8565 has a K value on 12/04/98 17:00 listed as "LESS THAN 10 mEq/L". I would hope so, as serum potassium values >= 10 probably fatal. Was this in the original database, or was this an artifact of the conversion to MIMIC 3 data?

It appears that way in the raw data. In fact it appears that way for the simultaneously measured sodium value (!), so I imagine it's again just obvious errors in the lab test (and somebody wrote it for sodium as well when they shouldn't have). We have done very little post-processing to the labs (as you can tell!).

Mauvila commented 4 years ago

I think including annotations/comments would be great. For lab values like extremely (unrealistic) low sodiums, they are suspect based solely on their values, so comments would reflect an algorithm run by the lab. Annotations would be great, but not absolutely necessary for interpretation. For other values, such as high Ks in the setting of hemolysis, comments by the lab usually reflect an additional visual examination of the specimen that can not be inferred from the lab values themselves. So in order to interpret a serum K, one must know if hemolysis was present or not. In these situations, annotations are essentially required for accurate interpretation of lab value.

For the sake of automated analysis/machine learning, though, I feel the comments should be standardized. Maybe from an enum (or whatever the informatics/ontology equivalent term is) consisting of "HEMOLYSIS", "SUSPECT CONTAMINATION", "PLATELET CLUMPING", some other things, and finally, "OTHER". My hospital reports hemolysis using the same comment: "HEMOLYSIS PRESENT. HEMOLYSIS AFFECTS ALC, ALB, AST, CK, CKMB, DBIL, TBIL, GGT, K, IRON, LACTIC ACID, LD, MG, PHOS, TP, TRIG, AND URIC ACID". So it wouldn't be hard to classify the hemolysis comments at my hospital appropriately. Not sure about other types of comments.

In regards to the "LESS THAN" values, the sodium "LESS THAN 10" is probably ok. It is totally acceptable to have "LESS THAN" values under certain conditions. For "LESS THAN", the value is almost ALWAYS to the "left" of the reference range. Likewise, for "GREATER THAN" values, the value is almost ALWAYS to the "right" of the reference range. What is bizarre about the potassium value stated is that the "LESS THAN" value is way to the right of the reference range (ie the value is greater than the upper limit of normal). Of course, with a human looking at the sodium value, there wouldn't be any confusion in recognizing the whole lab panel as invalid. It is just curious that they chose 10 as their "LESS THAN" value for the potassium part.

alistairewj commented 4 years ago

I totally agree; in fact deidentifying this field has been on the internal issue tracker since October 1st, 2015 (!). From my experience, the lab comments mention one or more of the following:

It would be great if there was an ontology for this that exists already that can cover this; it's pretty manageable. I can always ask the LOINC folks to take a look.

Ultimately, we'll include these lab comments in MIMIC-IV, and maybe the creation/standardization algorithms can be developed on that data, as it will be free-text.

alistairewj commented 4 years ago

The lab comments are de-identified in MIMIC-IV and made available. The deidentification filtered ~5% of the comments, so there is still some room to improve it, though these comments were mostly "NOTIFIED DR. ___". Closing the issue and if something comes up we can discuss in a new issue at the MIMIC-IV repo: https://github.com/MIT-LCP/mimic-iv/