Emory-HITI / Niffler

Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
https://emory-hiti.github.io/Niffler/
BSD 3-Clause "New" or "Revised" License
90 stars 53 forks source link

Support extracting Private DICOM tags using the png-extraction #258

Closed pradeeban closed 2 years ago

pradeeban commented 2 years ago

Currently, the png extraction module supports extracting only public DICOM attributes. Extending it to support private tags can significantly help research works that depend on those.

Pavan-Bellam commented 2 years ago

Hello, I am working on this issue.

pradeeban commented 2 years ago

ok, thanks for the update.

Pavan-Bellam commented 2 years ago

Hi @pradeeban, in some DICOM files there would be some private tags which the user does not want to extract such as histogram tag shown in the picture below. What I would like to do is extract only the tags whose value length is less than certain threshold or we could ask the user to provide all the tags which he does not want in config.json. Is there any other way to handle this better?
image

pradeeban commented 2 years ago

This is a good question. So let me answer in detail.

If you check https://github.com/Emory-HITI/Niffler/blob/dev/modules/png-extraction/ImageExtractor.py, you will see the below lines:

dicom images should not have more than 300 dicom tags

if len(kv)>300:

So, we are ignoring images with more than 300 attributes. Similarly, you could use the first/easy approach you mentioned (extract only the tags whose value length is less than a certain threshold).

You likely will need to add a property in https://github.com/Emory-HITI/Niffler/blob/dev/modules/png-extraction/config.json

"PublicHeadersOnly": true,

The above default will ensure by default only public headers will be extracted. When you set that to false, you will also extract the private headers. This will ensure that we are not bombarding users with private tags all the times (most users do not need those, and that is why we did not have this implemented for long).

Your second option is the ideal scenario. That is how the meta-extraction module handles its extraction. It uses a featureset. See https://github.com/Emory-HITI/Niffler/blob/dev/modules/meta-extraction/conf/featureset.txt

Ideally, we should have both options for the png-extraction. When the featureset is present, get only those listed fields. Otherwise, get everything (as it is now, but without the private tags for now).

Pavan-Bellam commented 2 years ago

Thank you for the answer. I will modify the code such that both the options are available for the user.

Nitesh639 commented 2 years ago

Hy @pradeeban , I have solved this issue. Please check it.

pradeeban commented 2 years ago

@Nitesh639 I have requested changes to your pull request.

Nitesh639 commented 2 years ago

@pradeeban I make changes. Check now.

pradeeban commented 2 years ago

Fixed by @Nitesh639 in Niffler-0.8.5.