cati-neuroimaging / deidentification

Tool to remove metadata allowing to identify a subject from DICOM images used in neuroimaging research
MIT License
4 stars 2 forks source link

DICOMFILE modified by anonymizer #49

Open rbonicel opened 3 months ago

rbonicel commented 3 months ago

Hello,

I tried to anonymize a whole exam using a DICOMDIR structure. I used the anonymizer.anonymize fuction. It worked properly, not raising the DICOMDIR error in _load_dataset method of the Anonymizer class. I then used a bidsapp calling pydicom and its dcmread function. It seems that the anonymizer tweaks the DICOMDIR file and makes it unreadable by the dcmread function.

The pydicom error is the following : Screenshot from 2024-08-21 15-52-45

I may need to run the anonymizer on sorted dcm files only, but could you tell me if the DICOMDIR structure, the DICOMDIR file wasn't supposed to be kept as it was ?

Thanks,

Robin

Hboni commented 3 months ago

Hello,

Thanks for reporting this issue, DICOMDIR was not tested with this tool. I tried with some data with DICOMDIR structure, and I get the same error as you as expected. While looking deeper in the DICOMDIR file, I found tags that are modified during anonymize step, which are mandatory to keep DICOMDIR readable with pydicom. However some of these tags are common tags to be anonymized as "(0010, 0010) Patient's Name". In my first tries I need to keep these tags intacts in order to read DICOMDIR with pydicom, which I am not happy with (keeping possible identifying data in DICOMDIR is not an option).

I am looking for ways to not breaking DICOMDIR read by pydicom without keeping possibly identifying tags. If you have some informations on that I would appreciate.

Hugo

rbonicel commented 2 months ago

Hi,

I displayed a non modified DICOMDIR with pydicom, and all I get is this list of tags repeated for every file comprised in the exam :


   (0020, 0013) Instance Number                     IS: '16'
   (0020, 0032) Image Position (Patient)            DS: [-118.30336364216, -119.66196093497, -0.6143950843996]
   (0020, 0037) Image Orientation (Patient)         DS: [0.99905059784755, -0.0161853599217, 0.04044671883669, 0.01386473288175, 0.99827727444726, 0.05701098582076]
   (0020, 0052) Frame of Reference UID              UI: 1.3.46.670589.11.1979660313.1583384597.3697106892.2207342673
   (0028, 0010) Rows                                US: 256
   (0028, 0011) Columns                             US: 256
   (0028, 0030) Pixel Spacing                       DS: [0.9375, 0.9375]
   ---------
   (0004, 1400) Offset of the Next Directory Record UL: 731446
   (0004, 1410) Record In-use Flag                  US: 65535
   (0004, 1420) Offset of Referenced Lower-Level Di UL: 0
   (0004, 1430) Directory Record Type               CS: 'IMAGE'
   (0004, 1500) Referenced File ID                  CS: ['DICOM', 'IM_0559']
   (0004, 1510) Referenced SOP Class UID in File    UI: MR Image Storage
   (0004, 1511) Referenced SOP Instance UID in File UI: 1.3.46.670589.11.1975217904.63531841.3748483611.4286723151
   (0004, 1512) Referenced Transfer Syntax UID in F UI: Explicit VR Little Endian
   (0008, 0008) Image Type                          CS: ['ORIGINAL', 'PRIMARY', 'PROJECTION IMAGE', 'M', 'FFE']
   (0008, 0016) SOP Class UID                       UI: MR Image Storage
   (0008, 0018) SOP Instance UID                    UI: 1.3.46.670589.11.1975217904.63531841.3748483611.4286723151
   (0008, 1140)  Referenced Image Sequence  3 item(s) ---- 
      (0008, 1150) Referenced SOP Class UID            UI: MR Image Storage
      (0008, 1155) Referenced SOP Instance UID         UI: 1.3.46.670589.11.1779574416.3301209519.2117923261.436045183
      ---------
      (0008, 1150) Referenced SOP Class UID            UI: MR Image Storage
      (0008, 1155) Referenced SOP Instance UID         UI: 1.3.46.670589.11.2826904821.1883528105.113711869.756561734
      ---------
      (0008, 1150) Referenced SOP Class UID            UI: MR Image Storage
      (0008, 1155) Referenced SOP Instance UID         UI: 1.3.46.670589.11.2226451768.1432823731.1875597727.4031837222
      ---------

which would make sense to me as from what I understand, DICOMDIR is only used to locate files inside the structure. But if the other useful tags are present as well, it is indeed a problem with the expected behaviour of the anonymizer (I agree about not keeping identifying data in DICOMDIR).

I tried this :

It's a nasty workaround, because I don't have to provide the DICOMDIR in the end, but a cleaner solution would be modifying the tags in DICOMDIR (such as (0010, 0010)) without corrupting the file when it's written, but that might be too troublesome..

Thanks, Robin