chop-dbhi / dicom-anon

Python DICOM Anonymizer
BSD 2-Clause "Simplified" License
66 stars 22 forks source link

Support Pixel Anonimizers #3

Open cancan101 opened 9 years ago

cancan101 commented 9 years ago

Allow plugging in a pixel anonymizer that blacks our the burned in annotations. Ideally it would plug in here and look something like: https://github.com/johnperry/CTP/blob/master/source/files/scripts/DicomPixelAnonymizer.script

jeffmax commented 9 years ago

Hi, thanks for the suggestion. I don't have much experience with CTP, but I agree an option to plugin a preferred pixel anonymizer would be a nice feature. I think the option could go here right before it cleans out the headers so that we don't destroy data the pixel cleaner needs. Do you have any experience with scripts to do this?

Are you currently using the dicom-anon script?

cancan101 commented 9 years ago

I am looking to use it. Currently I have a Matlab script that does the anonimization, but I would prefer to move to Python. In my matlab script I blank out the burned in annotations.

Another Python implementation I found is: https://github.com/darcymason/pydicom/blob/dev/pydicom/examples/anonymize.py

cancan101 commented 9 years ago

You also want to remove the burned annotation and then set burned in to false so that the file does not get quarantined.

jeffmax commented 9 years ago

I have used that- I basically wrote this script to be a more extensive version of that one.

On Tue, Mar 3, 2015 at 1:12 PM, Alex Rothberg notifications@github.com wrote:

I am looking to use it. Currently I have a Matlab script that does the anonimization http://www.mathworks.com/help/images/ref/dicomanon.html, but I would prefer to move to Python. In my matlab script I blank out the burned in annotations.

Another Python implementation I found is: https://github.com/darcymason/pydicom/blob/dev/pydicom/examples/anonymize.py

— Reply to this email directly or view it on GitHub https://github.com/chop-dbhi/dicom-anon/issues/3#issuecomment-77001228.

jeffmax commented 9 years ago

Good Point! I probably won't have time to properly dig into writing a pixel anonymizer in the near-term, but if you have something in MATLAB you would like convert to Python and contribute to the project we welcome any pull requests. I think the hard part is all the heuristics for identifying likely burnt-in data (and making that extendible), which you might already have (and it looks like the CTP script has a good start as well).

It has always been on my wish list to try to use some simple machine learning or OCR to look for text, or at least alert above a certain confidence.

jeffmax commented 9 years ago

I'd certainly be interested in helping integrate something if you contributed.

cancan101 commented 9 years ago

It looks like OB and OW VRs are being removed here: https://github.com/chop-dbhi/dicom-anon/blob/4a6f06887459e72fb07ba17c28ad2fa4747c74e0/dicom_anon.py#L551 which is the VR set on pixel data. This means the entire pixel data seems to be removed when "anonymizing".

jeffmax commented 9 years ago

So you gave me a heart attack on this one, but have tried it and seen it delete the pixel data? I think because of this line in pydicom

https://github.com/darcymason/pydicom/blob/master/source/dicom/_dicom_dict.py#L3706

it actually sets that VR string to "OB or OW" and it fails to match. Assuming this is preventing the problem for you, this is definitely not something it should rely on.

cancan101 commented 9 years ago

I'm not sure I follow what you are saying.

It looks like the VR string as presented by pydicom may be: 'OB or OW', 'OB' or 'OW'.

I have dealt with the issue for now:

def vr_handler(ds, e):
    if (e.VR in ['PN', 'CS', 'UI', 'DA', 'DT', 'LT', 'UN', 'UT', 'ST', 'AE', 'LO', 'TM', 'SH', 'AS', 'OB', 'OW'] and
        e.tag != PIXEL_DATA):
        del ds[e.tag]
        return True
    return False
jeffmax commented 9 years ago

Have you seen a situation where pydicom actually puts in the e.VR for the pixel data element the string "OW" or the string "OB"?

My question is that it looks like from file I linked to that PyDICOM sets that string to "OB or OW" so it won't match.

Here is an example from ipython examining a dicom file:

a[0x7fe0, 0x0010].VR
'OW or OB'
cancan101 commented 9 years ago

Definitely:

In [388]: ds = dicom.read_file("/Users/alex/Downloads/series (1).dcm")
ds[0x7fe0, 0x0010].VR

Out[388]: 'OB'

and after running the file through dcmdjpeg I see OW.

jeffmax commented 9 years ago

Look at that. Thanks for catching that.