Emory-HITI / Niffler

Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
https://emory-hiti.github.io/Niffler/
BSD 3-Clause "New" or "Revised" License
90 stars 53 forks source link

Running ImageExtractor.py is not finding .dcm files in target directory #363

Closed ceiag closed 1 year ago

ceiag commented 1 year ago

Describe the bug Running ImageExtractor.py is not finding .dcm files in target directory

To Reproduce Downloaded ImageExtractor.py & config.json, installed all dependencies, configured config.json, and ran ImageExtractor.py

Expected behaviour Expect it to discover dicom files in target directory and convert to png

Logs or screenshots

$ ls /home/xxxxxxxx/Desktop/Antares/patient/opt
OP000000.dcm

$ file /home/xxxxxxxx/Desktop/Antares/patient/opt/OP000000.dcm 
/home/xxxxxxxx/Desktop/Antares/patient/opt/OP000000.dcm: DICOM medical imaging data

$ cat config.json 
{
    "DICOMHome": "/home/xxxxxxxx/Desktop/Antares/patient/opt",
    "OutputDirectory": "png",
    "Depth": 0,
    "SplitIntoChunks": 1,
    "PrintImages": true,
    "CommonHeadersOnly": false,
    "PublicHeadersOnly": true,
    "SpecificHeadersOnly": false,
    "UseProcesses": 0,
    "FlattenedToLevel": "patient",
    "is16Bit":false,
    "SendEmail": true,
    "YourEmail": "test@test.test"
}

$ python3 ImageExtractor.py 
$ cat png/ImageExtractor.out
INFO:root:------- Values Initialization DONE -------
INFO:root:Number of dicom files: 0
ERROR:root:There is no file present in the given folder in /home/xxxxxxxx/Desktop/Antares/patient/opt/*.dcm

Environment (please complete the following information):

pradeeban commented 1 year ago

The log means, the folder does not have any valid DICOM files. I know you have a file with the .dcm extension in that folder. But the file must be invalid (i.e., not a DICOM file).

Can you please try to reproduce the issue with a publicly available DICOM file such as the ones in https://www.cancerimagingarchive.net/collections/?

Then if you can still reproduce it, please point me to one such DICOM file so that I can further debug this issue.

I tried to reproduce with the below configuration with the same Niffler version as yours.

{ "DICOMHome": "/Users/Pradeeban/Desktop/op", "OutputDirectory": "png", "Depth": 0, "SplitIntoChunks": 1, "PrintImages": true, "CommonHeadersOnly": false, "PublicHeadersOnly": true, "SpecificHeadersOnly": false, "UseProcesses": 0, "FlattenedToLevel": "patient", "is16Bit":false, "SendEmail": true, "YourEmail": "test@test.test" }

I cannot reproduce the issue. Even when a DICOM file that has some issues were used, this is the log that is thrown:

$ cat ImageExtractor.out INFO:root:------- Values Initialization DONE ------- INFO:root:Number of dicom files: 1 DEBUG:root:Loaded the first file successfully WARNING:pydicom:Expected sequence item with tag (fffe, e000) at file position 0x14 WARNING:pydicom:Expected sequence item with tag (fffe, e000) at file position 0x14 WARNING:pydicom:Expected sequence item with tag (fffe, e000) at file position 0x14 WARNING:pydicom:Expected sequence item with tag (fffe, e000) at file position 0x14 INFO:root:Chunk 0 Number of fields per file : 48 INFO:root:Start processing Images ERROR:root:Unable to convert the pixel data: one of Pixel Data, Float Pixel Data or Double Float Pixel Data must be present in the dataset ERROR:root:1 out of 1 dicom images have failed extraction INFO:root:Chunk run time: 0.11768984794616699 seconds! INFO:root:Generating final metadata file INFO:root:Generating final mapping file INFO:root:Total run time: 0.15414905548095703 seconds! Pradeebans-MacBook-Pro:png pradeeban$ tree . ├── ImageExtractor.out ├── ImageExtractor.pickle ├── extracted-images ├── failed-dicom │   ├── 1 │   │   └── 1.2.276.0.28.3.194080963160286.12.3612.2018102319423250000.dcm │   ├── 2 │   ├── 3 │   ├── 4 │   └── 5 ├── mapping.csv ├── maps │   └── mapping_0.csv ├── meta │   └── metadata_0.csv └── metadata.csv

9 directories, 7 files

On the other hand, if I instead use a fake DICOM file (for example, create a text file and store it as test.dcm under the folder), I get the message that you get (which simply states there is no real DICOM files present).

Nitesh639 commented 1 year ago

ImageExtractor.py is working properly. @ceiag make sure you are use .dcm file format not .DCM format. This thing also cause error because we are only detect .dcm files. https://github.com/Emory-HITI/Niffler/blob/master/modules/png-extraction/ImageExtractor.py#L334

Nitesh639 commented 1 year ago

@pradeeban, I think we should be add some examples of .dicom(.dcm format) images.

pradeeban commented 1 year ago

@Nitesh639 I will include the links to TCIA to download sample files. We shouldn't duplicate DICOM files in the repository. Git repositories should avoid binary files.

Also, in practice, whoever uses the Niffler PNG Extraction should have access to DICOM files (as Niffler is a domain-specific radiology software). I understand sample DICOM files will be helpful for situations such as Google Summer of Code contributors.

Nitesh639 commented 1 year ago

OK.

ceiag commented 1 year ago

Thanks All for the replies.

Yep looks like something isn't right my end, even after downloading a publicly available dicom dataset from https://www.cancerimagingarchive.net/collections/ it's still not detecting .dcm files in the target directory.

Nitesh639 commented 1 year ago

@ceiag Go with this link. I have a strong feeling you will get some output files. Don't change the config.json file.

pradeeban commented 1 year ago

Closing this issue since @Nitesh639 has given sample files to run without even changing the configurations. This should help you get started, when you are not familiar with Niffler.