blairconrad / dicognito

A library and command line tool for anonymizing DICOM files
MIT License
22 stars 6 forks source link

Anonymizing according to DICOM Standard #143

Closed lmdulz closed 1 year ago

lmdulz commented 1 year ago

Hey guys, I am currently working on an anonymizing tool for dicoms in typescript and am really appreciating your great work. As I was thinking about what and how to anonmyize i found this table that defines a standardized way to anonymize dicoms and was wondering: how did you define what tags you wanted to alter with dicognito and in which way? Did you follow NEMA guidelines in some way?

blairconrad commented 1 year ago

Hi, @lmdulz. Thanks for your interest! I'm afraid my answer will not be super enlightening. I was not aware of that table. Thanks for bringing it to my attention. I can see I'll have to take a gander and see how it can offer ideas for improving dicognito.

I mostly did two things:

  1. sat and thought about what tags could leak information, looking at some of the Information Module Definitions.
  2. looked at an internal tool my company had written that had some deficiencies, to make sure I at least didn't lose functionality relative it.

That's it! I'm sure dicognito is deficient in a few ways relative the standard(s).

I'm happy to talk more about this if you like.

lmdulz commented 1 year ago

Hey, @blairconrad thank you for the insights! So are you seeing dicognito as a standalone tool or would you use it as addon to your internal companies' tool? Without wanting to criticize your coverage, it appears to me that using dicognito without caring too much about the tags contained in ones Dicom-files, there still might be some that could leak information, that wouldn't be altered. Have you thought about remodeling, so that you'd iterate over the table I added before and just check against the occurrence in the file? Like that you could reuse all the classes you created but stick to standardized anonymization regulations.

blairconrad commented 1 year ago

Hi, @lmdulz.

I think of dicognito as a standalone tool (one could also use as a library, but I don't know if anyone has). It isn't specifically integrated with anything in the place I work.

there still might be some [tags] that could leak information

You're probably right.

Have you thought about remodeling, so that you'd iterate over the table I added before and just check against the occurrence in the file?

I had put it on the backburner, to be honest. Not due to lack of interest as such, but due to conflicting demands on my time.

I've re-scanned the table since you wrote last week, and am getting a slightly better understanding of it. In theory it's achievable, of course, and seems desirable, but I have a couple of concerns.

I worry that the changes are too extreme to make them useful for things I want to do with the objects, or that other clients want to. If someone's running a big trial on how some drug works when people have a certain allergy, and we remove the allergies, then that's going to be a problem.

Also, it looks like a lot of work. One would need to figure out in which case some of the attributes can be removed, replaced with 0-length values, or replaced with non-zero-length values, depending on context. And for those that need dummy values, a scheme has to be devised to provide such a value. For example, Acquisition Field Of View Label (0018,11BB), which I've no idea what it even is!

So I'm intrigued, but wondering if it would end up being a large investment of effort that ends up creating a tool that I can't even use…

lmdulz commented 1 year ago

Hey @blairconrad,

which I've no idea what it even is

I have exactly the same problem, also I can't wrap my head around how one should replace thes tags with VR of Other Bytes/ Other Double/ Other ....

One would need to figure out in which case some of the attributes can be removed, replaced with 0-length values, or replaced with non-zero-length values, depending on context

To my undestanding, if you want to be strict, that's exactly defined in the table. "Table E.1-1a. De-identification Action Codes" has the meaning to all codes. In the "Table E.1-1. Application Level Confidentiality Profile Attributes" there is the Coulmn "Basic Prof." where the Codes are set how to handel each tag, which I think is the way to go to achiev full, standardized anonymization.

I worry that the changes are too extreme to make them useful for things I want to do with the objects, or that other clients want to.

But I can definitely relate, that if it comes to other questions that are not only depending on image data, individual work has to be done - I also couldnt say which commercial tools are following the confidentiality profiles from the standard. Although it came to my mind, that if one is interested in the content of on or some specific tags, one could create a protector class, that is the first inside the element_handlers and just skips/ protects the tags that one defines in the anonymization procedure.

For my use case at the moment, I'll translate your dicognito into typescript as we need it to be faster in a browser environment. Of course I'll fully reference your work there. Maybe when I'm done with that I'll try to wrap my head around if it wouldn't be possible to implement the whole Basic Profile. Luckily I'd have that time at hand, bc we'll probably use it also internally for a tool we're building.

blairconrad commented 1 year ago

if one is interested in the content of on or some specific tags, one could create a protector class, that is the first inside the element_handlers and just skips/ protects the tags that one defines in the anonymization procedure.

100%. I thought of a similar thing. Which might work great. Until there are 3, 4, …, 12 tags to keep, which may become a bit of a pain. But who knows!

blairconrad commented 1 year ago

Hey, @lmdulz. I think the question's been answered (to the best of my current ability). Any objection to me closing the issue?

lmdulz commented 1 year ago

No objections, thank you for adressing it!