KitwareMedical / dicom-anonymizer

Tool to anonymize DICOM files according to the DICOM standard
BSD 3-Clause "New" or "Revised" License
104 stars 47 forks source link

Performance problems with newer MR software version (Siemens) #50

Closed Ede1994 closed 10 months ago

Ede1994 commented 1 year ago

I noticed that the anonymizer, when used via Python for a newer MR software version with modified DICOM header (modified with respect to the older versions), is significantly slower than for older MR software versions. I have attached an excerpt from the terminal:

100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 47.61it/s]
2023-08-23 09:00:00 dcm_anon     INFO     File 1.3.12.2.1107.5.2.36.40414.201712181.dcm with software version syngo MR B19.

100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.83it/s]
2023-08-23 09:01:51 dcm_anon     INFO     File 1.3.12.2.1107.5.2.50.176395.20230531.dcm with software version syngo MR XA50.

This is the same imaging sequence, just on a different scanner with a newer software version.

finetjul commented 1 year ago

Maybe those DICOM's are significantly different. Should they have nested DICOM tags, it may make the algorithm slower. It would be good if we could have both DICOMs (ideally original or but anonymized could be a first step).

Ede1994 commented 1 year ago

I think while putting together the headers I noticed the reason for this. Siemens has started to bundle several files with the new software version. With the old versions there are 2561 individual DICOM files per QSM, with the new version only 54 files, but then with correspondingly large files. Thus the header grows to over 6000 lines, whereby these are also nested.

Ede1994 commented 1 year ago

Here is a sample of the new software version: xa50_sample.txt

finetjul commented 1 year ago

Are you therefore sure that the problem comes from the anonymization and not simply the fact that the files are bigger ? Does it take more time to process 54 large files than 2561 small files ?

Ede1994 commented 1 year ago

Yeah okay, I've checked this and it is almost equally fast. Sorry for the confusion. One more question: As you can see in the txt file there are lots of nested tag groups and I will delete for example all tags with arrays as their values, e.g. [0x5200, 0x9229][0][0x0021, 0x10fe][0][0x0021, 0x1019]. In pydicom I get this tag like this:

ds = pydicom.dcmread(filename_new)
elem = ds[0x5200, 0x9229][0][0x0021, 0x10fe][0][0x0021, 0x1019]
print(elem)

How can I delete this specific tag with your tools?

finetjul commented 1 year ago

This is a good question, I do not know if it is doable as of now.

@pchoisel do you have an idea ?

pchoisel commented 1 year ago

Hi,

If you want to delete tags that are nested inside a sequence tag, you will have to delete the entire sequence tag.
In order to delete the sequence tag (0x5200, 0x9229), you can call dicom-anonymizer with the following arguments : -t '(0x5200, 0x9229)' delete

Otherwise, the code would need a bit of modification to delete single tag inside a sequence.

pchoisel commented 10 months ago

Closing this for no activity @Ede1994 If you need further help with dicom-anonymizer, feel free to open another issue