Open sgithens opened 9 years ago
It seems that most fields can just be blanked out. i.e. I think this exception is thrown when reading the value for the purposes of writing the audit db.
This appears to be thrown from a function in pydicom that is supposed to remove all non-standard tags (https://www.medicalconnections.co.uk/kb/DICOM_Private_Elements). I'll take a closer look.
So, interestingly, the field it is throwing the exception on is not a private data element, but the "Exposure" which has a VR of IS (Integer String). It looks like it is just an incorrect DICOM file that has placed a decimal value in an IS field. It looks like remove_prive_tags() recurses over each attribute looking for private tags. It is a shame to quarantine the file completely for such a minor issue, but taking on the task of fixing up bad DICOM files so pydicom will operate on them sounds out of scope for this script so it is probably the best we can do.
This seems to be a similar issue: https://code.google.com/p/pydicom/issues/detail?id=152
On the bright-side, looking into this I learned about this: https://docs.python.org/2/library/contextlib.html#contextlib.contextmanager
@jeffmax Thanks for the diagnostic. Do you think the best thing then is to wrap the anonymize at line 760 in a try/except and then quarantine it the same as on lines 745-749?
What line do you think is actually throwing the exception? Because of the way the walk is called, the original call site of the exception is lost in the trace above.
The first site in dicom_anon or in the third party libraries? I'm pretty sure the trace above has the accurate point for that being dicom_anon.py:678. I was going to wrap it at at 760 though for that anonymize call, since at that point all the arguments necessary to call the quarantine_file method are available.
So the issue looks like pydicom tried to parse the element even though you are just checking the is_private
status:
pydicom/dataset.pyc in __getitem__(self, key)
306 character_set = default_encoding
307 # Not converted from raw form read from file yet; do so now
--> 308 self[tag] = DataElement_from_raw(data_elem, character_set)
309 return dict.__getitem__(self, tag)
One option would be to have DataElement_from_raw
inject a default value:
https://github.com/darcymason/pydicom/blob/9a97a7955a6bcb5015bb65e5c55e42597eb164b4/source/dicom/dataelem.py#L319-L324
I think for this particular issue ( a bad tag in my dicom file), I probably would tend to concur with @jeffmax above, I don't really want to patch the pydicom project for this issue. I didn't run in to this until about mammogram 1500 of my 5000 I'm churning through right now, so I don't expect it to have too often, and I'm Ok with just quarantining it for now to deal with manually later. (Although if it does become more of a problem I'd likely dig further into pydicom! :p ) I'm syncing my directories for the ones that have already been processed right now, but after that am going to go ahead and wrap that ValueError and run it again. ( If everything is awesome after that will submit a pull. )
@cancan101 started looking at pydicom tickets related to this issue and it seems you have been involved in the discussion
https://github.com/darcymason/pydicom/pull/193 and https://github.com/darcymason/pydicom/pull/197
I guess I will leave this ticket open for now in case there is an API change that allows us to handle this better.
Looks like there is a good solution for addressing this in https://github.com/darcymason/pydicom/pull/198 Will keep this open until this is pulled into a release. Thanks @kevlarkevin!
Sounds good thanks guys! If there is a fix in pydicom and we pull in a new version and you want me to test on any of my broken dcm's let me know!
While running a large batch of mammograms today, dicom-anon blew up with the following exception. It appears that the error is in the mammogram file, but I'm wondering if we shouldn't be blowing up when it occurs.
Would the appropriate action in this case be to quarantine the image? I can try to put together a patch if you think this is the appropriate behavior. (If not, what should we be doing in this case?)