De-Identification - Githubissues

JohnMoehrke commented 5 years ago

Reference to the De-Identification section of the FHIR specification could also be relevant. De-Identification can be used on a project-by-project basis to lower the risks to the patient of data exposure, and can also bring benefit to the clinical-research by eliminating unnecessarily enticing personal information. http://build.fhir.org/secpriv-module.html#deId

JohnMoehrke commented 5 years ago

Your Patient page could mention that a new Patient resource (pseudo-Patient) is made inside the clinical-research project to represent the 'real' patient. This pseudo-Patient resource has ONLY the elements mentioned present. The pseudo-Patient.identifier relationship to the real-Patient would then need to be described. There are a few alternatives based on the future need for data feeding, and the future need for re-Identification. Where neither of these two features are needed, there should be no relationship between the pseudo-Patient.identifier and the real-Patient. Where one is needed the usual approach is to have a translation table maintained by the Healthcare Provider organization, not by the research organization..

JohnMoehrke commented 5 years ago

One specialization of this is that the translation table uses the Person resource. This leverages FHIR, but may be overly complex where a simple table is all that is needed.

JohnMoehrke commented 5 years ago

I did just now find some of this guidance, but not all of it, on your general page. This likely needs to be highlighted as a standalone section.

mattkoch614 commented 5 years ago

Reference to the De-Identification section of the FHIR specification could also be relevant. De-Identification can be used on a project-by-project basis to lower the risks to the patient of data exposure, and can also bring benefit to the clinical-research by eliminating unnecessarily enticing personal information. http://build.fhir.org/secpriv-module.html#deId

@JohnMoehrke Thanks for the suggestion! I think a reference to the de-identification section of the FHIR spec is a good idea. With respect to doing this on a "project-by-project basis," can you expand on that a bit more? I'm not sure there are any cases where we wouldn't want de-identified data here.

Your Patient page could mention that a new Patient resource (pseudo-Patient) is made inside the clinical-research project to represent the 'real' patient. This pseudo-Patient resource has ONLY the elements mentioned present. The pseudo-Patient.identifier relationship to the real-Patient would then need to be described. There are a few alternatives based on the future need for data feeding, and the future need for re-Identification. Where neither of these two features are needed, there should be no relationship between the pseudo-Patient.identifier and the real-Patient. Where one is needed the usual approach is to have a translation table maintained by the Healthcare Provider organization, not by the research organization..

I'm not sure I follow this 100%. You mention "project" here, and I think I need some more clarification as to what you might be referencing. Do you simply mean a loosely defined relationship between a requestor and provider (e.g. client/server) or something else?

In addition, in terms of an identifier, why do we need a "pseudo-Patient"? This is where we had thought ResearchSubject would come into play. There is an identifier attribute on this resource that would, for all intents and purposes, be used to identify a patient on a clinical trial in a de-identified way. I have heard this type of identification referred to as "study ID" in many cases.

https://www.hl7.org/fhir/researchsubject-definitions.html#ResearchSubject.identifier

JohnMoehrke commented 5 years ago

De-Identification is a 'process', that uses risk based analysis to enable the information that is valuable to the project, while eliminating as many identififying characteristics as possible. I refer to a project as each instance of a clinical-research project. As each one will be focused on different medical information. Thus not all clinical-research projects need the same data elements. Those data elements that are not relevant to that project would be removed, while critical elements will be allowed. Some critical elements for one project might be quasi-identifiers (e.g. zipcode, gender, age, etc...). Including even as small as three quasi-identifiers in a data-set makes it possible to re-identify very easy. There is no way to have one de-identification algorithm that works for all projects, this is what HIPAA tried to do and has failed.

I also bring in project-by-project, as I do expect some clinical-research projects will use fully identifiable data, not including any form of de-identification. This is not unusual. It is however a major privacy trade-off, which is why I point out that it needs to be a project-by-project assessment. An Observation with the original .patient element is 'identifiable'. To De-Identifify that Observation, you must break the identifiability by replacing the .patient element value with something else. This is the pseudo-Patient resource. The ResearchSubject resource is a new resource. It is at FMM 0, meaning the committee has not finished preparing it for public review. From what I can tell, this is a resource that would hold linkage. But this resource does not modify the original data, and thus provides no Privacy benefit. The ResearchSubject resource does not 'stand in for' a Patient resource.

mattkoch614 commented 5 years ago

OK, I think I see your point. I'm not sure this IG will cover every single flavor of de-identification. What we are aiming to define here is a guide for the scoped exchange between clinical trial sponsors and data providers. At the very least, it should be possible to come to an agreement on what, in all cases, should NEVER be included (e.g. a patient name) as well as how someone is represented on a clinical trial (through the join between Patient and ResearchStudy using ResearchSubject.)

I want this issue to remain focused on de-identification, but I will say that I also believe we need to push forward the use of these lower maturity resources. Just because they are not fully "prepared" doesn't mean they don't hold value, and what better way to provide evidence of that than to show interest in them?

@lengfelj @daihugh @cjcuster Thoughts?

mattkoch614 commented 5 years ago

Closing due to inactivity.

esource-consortium / fhir-clinical-research

De-Identification #8