ImagingDataCommons / CloudSegmentator

Medical imaging segmentation workflows for FireCloud (Terra) and Seven Bridges Cancer Genomics Cloud
Apache License 2.0
3 stars 2 forks source link

dcm2niix errors #63

Closed vkt1414 closed 2 months ago

vkt1414 commented 8 months ago

35 out of 126088 series failed at the dcm2niix step. I tried running these on a colab notebook and here's the breakdown of warning messages from dcm2niix. But I cannot seem to analyze why these failed.

29 series had these warnings:

Warning: PatientOrient (0018,5100) not specified (issue 642).
Warning: Instance Number (0020,0013) order is not spatial.
Warning: Interslice distance varies in this volume (incompatible with NIfTI format).

and 4 series had gantry tilt warnings:

Warning: PatientOrient (0018,5100) not specified (issue 642).
Gantry Tilt based on 0018,1120 -0.5, estimated from slice vector -0.500068
Warning: Note these images have gantry tilt of -0.500068 degrees (manufacturer ID = 1)

and 2 series had PatientOrientation warnings which are seen very commonly

Warning: PatientOrient (0018,5100) not specified (issue 642).

Here's the list of the series:

'1.2.840.113654.2.55.332967611868866852816434039713239600967',
 '1.2.840.113654.2.55.40419853500958583892692984245133843269',
 '1.2.840.113654.2.55.216303520934357806535330740047505109900',
 '1.2.840.113654.2.55.309595956107582522635422789264391197357',
 '1.2.840.113654.2.55.127964691362900131512797215584482901886',
 '1.2.840.113654.2.55.68896109217605566555815797240366466824',
 '1.2.840.113654.2.55.264743906535832845397781037612854512227',
 '1.2.840.113654.2.55.312368641339138269093726402804454532443',
 '1.2.840.113654.2.55.149533813638057001283245538088540122030',
 '1.2.840.113654.2.55.249168959444126325502445739870060352091',
 '1.2.840.113654.2.55.164713758187784264566137953792196285056',
 '1.2.840.113654.2.55.208030947832624335141390517527583501667',
 '1.2.840.113654.2.55.181615646167494492707070609724375539907',
 '1.2.840.113654.2.55.324248630471768832493740021610792841179',
 '1.2.840.113654.2.55.77882626071437131469484968626106656768',
 '1.2.840.113654.2.55.194178931700890149219754040541632865102',
 '1.2.840.113654.2.55.269046607301966689031560437413200379735',
 '1.2.840.113654.2.55.227102672124998044190246607660430734530',
 '1.2.840.113654.2.55.33752670926966073968573735091919133063',
 '1.2.840.113654.2.55.52290598641570267822291195626463367880',
 '1.2.840.113654.2.55.95119650833179501525970172637900584756',
 '1.2.840.113654.2.55.306627769415917064875292696738000192635',
 '1.2.840.113654.2.55.45175379989755138381831944574545956627',
 '1.2.840.113654.2.55.106965363928591531874825582318448033751',
 '1.2.840.113654.2.55.55047539427804038709996541039810793667',
 '1.2.840.113654.2.55.95884331101546572273025786497746152778',
 '1.2.840.113654.2.55.146802245991187488697324518476638486675',
 '1.2.840.113654.2.55.117433073336788426159401130299700793222',
 '1.2.840.113654.2.55.190058889121024472573870679836063600278',
 '1.2.840.113654.2.55.211367098257029003513885800314277182774',
 '1.2.840.113654.2.55.224190222944980200130853470491917893476',
 '1.2.840.113654.2.55.2253375745299807311628979336682171677',
 '1.2.840.113654.2.55.235708855988303880549573988077504665842',
 '1.2.840.113654.2.55.291787608127493613448368626528674406153',
 '1.2.840.113654.2.55.88241883339960085295132785980997342102'

only these 5 series exceeded tolerance warning in slicer.

1.2.840.113654.2.55.181615646167494492707070609724375539907
1.2.840.113654.2.55.227102672124998044190246607660430734530
1.2.840.113654.2.55.33752670926966073968573735091919133063
1.2.840.113654.2.55.52290598641570267822291195626463367880
1.2.840.113654.2.55.95119650833179501525970172637900584756

The notebook and list of series along with slicer idcbrowser urls are attached. https://colab.research.google.com/drive/1oMbG_xImkcE5bsc5MIN_2B2yU6bZe2Sg?usp=sharing slicer_links.csv

fedorov commented 8 months ago

@vkt1414 based on the discussion in the issue above, can you run a check for the consistency of the first component of ImageType, and see how many of the series would be filtered out? Since we did not have any such failures for NLST, it means this issue is rare and did not affect the prior cohort analyzed.

vkt1414 commented 8 months ago

30 of 35 had inconsistent image type values in a series. Of the remaining 5, 4 have gantry tilt errors. There's only 1 mysterious series left with series uid 1.2.840.113654.2.55.181615646167494492707070609724375539907

I was curious to see if there are any series with inconsistent image type in the 126k cohort. In total there are 63 series. So, there are 33 other series, that were somehow able to convert properly, which makes me wonder if there is some other issue was blocking the conversion of 30 series. I summarized the 63 series in a csv file as we are dealing with a lot of numbers. 63_inconsistent_imagetype_series.csv

neurolabusc commented 8 months ago

@vkt1414 can you share with my a series where there is a gantry tilt error as well as the remaining problematic image. Ideally, you could send me a link to the zipped DICOMs from google drive to my institutional email or provide a URL I can download with curl.

neurolabusc commented 8 months ago

@vkt1414 two images in the series 1.2.840.113654.2.55.181615646167494492707070609724375539907 are corrupted. You can see this by running the following code in Python":

from idc_index import index
c = index.IDCClient()
c.download_from_selection(seriesInstanceUID="1.2.840.113654.2.55.181615646167494492707070609724375539907", downloadDir="./")

Note that 126 of the 128 files include image data and have sizes around 527kb. However, two files are unusually small (2kb): 2b6326b8-a790-43af-b764-26b6883e0516.dcm , ded44b5b-2281-4d44-969a-b1830b116c72.dcm. Viewing the dicom header (e.g. using dcmdump) shows these files are missing the image data.

vkt1414 commented 8 months ago

Hi @neurolabusc ,

Thank you very much for diagnosing the mysterious series. We will look into how the series made it to IDC.

Please find the links to GitHub release attachments that should be downloadable with curl or wget, containing: 4 series that showed gantry tilt errors https://github.com/vkt1414/CloudSegmentator/releases/download/test/GantryTiltError.zip

30 series that were NOT able to convert with inconsistent ImageType https://github.com/vkt1414/CloudSegmentator/releases/download/test/Inconsistent.ImageType.zip

33 series that were able to convert despite inconsistent ImageType https://github.com/vkt1414/CloudSegmentator/releases/download/test/dcmniix_processed_despite_inconsistent_imageType_part1.zip https://github.com/vkt1414/CloudSegmentator/releases/download/test/dcmniix_processed_despite_inconsistent_imageType_part2.zip

I included all series for completeness but please feel free to pick and choose selectively. I'm curious how some series were able to, and some weren't able to convert to NIfTI with inconsistent ImageType.

Thank you very much for your help. We really appreciate it!

vkt1414 commented 7 months ago

@neurolabusc Just wondering if you had a chance to review how the inconsistent ImageType series were able to be converted by dcm2niix. Thank you!

neurolabusc commented 7 months ago

@vkt1414 if you want feedback, provide a minimal demo. A 1 Gb download with 24 series acquired on three separate sessions does not make it explicit what your issue is. You can always rename and reorganize your DICOMs with dcm2niix -r y /path/to/DICOMs to extract out specific series that are causing you confusion.

bcli4d commented 7 months ago

@vkt1414, as it turns out, TCIA requires that you get approval in order to access their NLST data... which is kind of ironic considering that it's available from IDC. I have access, so downloaded a zip of the series in question, and which you can get at gs://whc_etl_dev/1.2.840.113654.2.55.181615646167494492707070609724375539907.zip There are, indeed, two instances in the zip which are 2300B.

vkt1414 commented 7 months ago

@bcli4d thank you very much! I was going to message you on slack about them.

vkt1414 commented 7 months ago

@vkt1414 if you want feedback, provide a minimal demo. A 1 Gb download with 24 series acquired on three separate sessions does not make it explicit what your issue is. You can always rename and reorganize your DICOMs with dcm2niix -r y /path/to/DICOMs to extract out specific series that are causing you confusion.

@neurolabusc I apologize if the previously attached files are overwhelming. I now included only two series, one for kind of error: Gantry error and inconsistent ImageType yet somehow dcm2niix converted them to NIfTI without any trouble.

https://github.com/vkt1414/CloudSegmentator/releases/download/test/dcm2niix_troubleshooting.zip

I'm more interested in the latter as the claim that 'dcm2niix can't convert series with inconsistent ImageType' does not seem to hold true all the time.

Thank you very much!

vkt1414 commented 7 months ago

@vkt1414, as it turns out, TCIA requires that you get approval in order to access their NLST data... which is kind of ironic considering that it's available from IDC. I have access, so downloaded a zip of the series in question, and which you can get at gs://whc_etl_dev/1.2.840.113654.2.55.181615646167494492707070609724375539907.zip There are, indeed, two instances in the zip which are 2300B.

@fedorov I double checked with pydicom and confirmed that there is no pixel data in those two DICOM files, even from TCIA. Please advise what steps we should take next.

fedorov commented 7 months ago

I think there are several action items:

  1. For our NLST analysis, we should just skip those series that have missing image data
  2. We should report this to TCIA. Can you please email TCIA helpdesk reporting this finding, include "[IDC]" in the prefix of the support request subject line, and ask them to include me and @bcli4d on the ticket.
  3. We should discuss if/how we could scan the data we have in IDC to identify which images have missing image data.
  4. We should discuss adding a step to the ETL that would take a note when image data is expected but missing, since that is the place where performing such check would be easiest.
neurolabusc commented 7 months ago

@vkt1414

vkt1414 commented 7 months ago

@vkt1414

  • one series has some files saved with ORIGINAL\PRIMARY\AXIAL\CT_SOM5 SPI and others saved with ORIGINAL\SECONDARY\AXIAL\CT_SOM5 SPI. It would be wise not to combine these. However, dcm2niix only avoids combining derived and non-derived images from the same series, and neither is labeled as DERIVED so dcm2niix concatenates these.
  • I do not get a gantry tilt warning with the current stable release of dcm2niix. I suspect that the recent commit to increase the gantry tilt tolerance explains this.

@neurolabusc Thank you very much for your help! We really appreciate it.