Emory-HITI / Niffler

Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
https://emory-hiti.github.io/Niffler/
BSD 3-Clause "New" or "Revised" License
90 stars 53 forks source link

Enable the feature to extract only a certain DICOM headers in png-extraction #282

Closed pradeeban closed 2 years ago

pradeeban commented 2 years ago

As of Niffler-0.8.5, png-extraction extracts all the metadata. The only filtering options are, getting all the images, getting the common headers, and getting only the public headers, as supported by the config.json options:

"CommonHeadersOnly": false,
"PublicHeadersOnly": true,

Enabling extracting only a certain subset of DICOM headers (both private and public), as supported by the meta-extraction module through featureset.txt files as in https://github.com/Emory-HITI/Niffler/blob/dev/modules/meta-extraction/conf/featureset1.txt is an alternative.

This option makes the png-extraction module more efficient when we just need certain headers stored in the output CSV file.

We add a new header (which is by default, false)

"SpecificHeadersOnly": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,

You could copy-paste https://github.com/Emory-HITI/Niffler/blob/dev/modules/meta-extraction/conf/featureset1.txt as the default featureset.txt for this module, but also add PhotometricInterpretation into this new featureset.txt.

In this way, there is always a featureset.txt.

We can make certain requirements from the users from what is an accepted featureset.txt. A featureset.txt must always consist of the below 4 fields as they are mandatory for the png-extraction.

PhotometricInterpretation PatientID StudyInstanceUID SeriesInstanceUID

Just mention this in the README. That is sufficient.

When you implement the feature, just make sure to test for all the below 3 cases: 1.

"SpecificHeadersOnly": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,

This is the current default. This pulls all the public headers.

2.

"SpecificHeadersOnly": false,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,

This pulls all the public and private headers.

3.

"SpecificHeadersOnly": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,

This is currently not implemented. When SpecificHeadersOnly is set to true, it will ignore CommonHeadersOnly and PublicHeadersOnly tags, and extract the public and private tags mentioned in the featureset.txt (regardless whether they are public, private, or uncommon).

So, essentially, the below are all the same:

"SpecificHeadersOnly": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": false,

and

"SpecificHeadersOnly": true,
"CommonHeadersOnly": false,
"PublicHeadersOnly": true,

and

"SpecificHeadersOnly": true,
"CommonHeadersOnly": true,
"PublicHeadersOnly": false,

and

"SpecificHeadersOnly": true,
"CommonHeadersOnly": true,
"PublicHeadersOnly": true,
Pavan-Bellam commented 2 years ago

Almost completed! Will submit a PR by evening

Nitesh639 commented 2 years ago

@pradeeban , Is any one working on this issue? I can solve this issue.

pradeeban commented 2 years ago

@Nitesh639 Sure, go ahead.

In a usual scenario, we would discourage multiple contributors from working on the same bug/RFE. But these are some good first issues for GSoC contributors to learn DICOM with Python. So, these are exceptions.

I will merge the pull request that better fits Niffler's architecture. Even if your pull request is not merged, please make sure to include them in your GSoC proposal.

Nitesh639 commented 2 years ago

@pradeeban, After merging this PR private tags is not working. May I solve this issue again.

pradeeban commented 2 years ago

Yes, please. Thanks.

Pavan-Bellam commented 2 years ago

Hello @Nitesh639, I have tested that code again and I am getting private tags too. could you help me in reproducing the issue so that I could find where I went wrong. And in your code, while extracting private tags check if the length of the attribute value less than 300 and add only those values. for further information regarding this point please refer to #258