MIT-LCP / mimic-iii-paper

Repository for the paper describing MIMIC-III
http://www.nature.com/articles/sdata201635
74 stars 37 forks source link

Details of deidentification process #16

Closed tompollard closed 8 years ago

tompollard commented 8 years ago

The de-identification process referenced may have been rigorously evaluated (ref to 2008 study), but has this validation been repeated. Were additional efforts made to confirm de-identification of the current dataset? If there have been additional automated advances or manual effort at the deID process, they should be noted.

tompollard commented 8 years ago

From @li-lcp

We have fine-tuned our de-id algorithm, previously described in Neamatullah et al, to the current dataset through an iterative manual review and development process; in each iteration, regular expression filters were calibrated, and the look-up dictionaries were expanded until all known PHIs identified in the review process were removed. Using this iterative manual review and development process, we have given scrupulous attention to the task of locating and removing all PHI so that the remaining data can be considered de-identified. Nevertheless, because of the richness and detail of the database, the de-identified data set is released only to legitimate researchers under a data user agreement.