DIAGNijmegen / bodyct-dsb2017-grt123

Repository which contains the code of the grt123 solution from the Kaggle DSB 2017 challenge on lung cancer detection
Other
2 stars 0 forks source link

Output all nodule candidates with their respective nodulePred to separate file #7

Closed cjacobs1 closed 5 years ago

cjacobs1 commented 5 years ago

Extracting all the bounding boxes for the nodules as Findings in XML format similar to Arnaud's nodule classification processor. We will extract:

Proposal output finding XML format:

<LungCADReport>
<LungCAD>...</LungCAD>
<ImageInfo>...</ImageInfo>
<CancerInfo>
  <CaseCancerProbability>0.89</CaseCancerProbability>
  <ReferenceNoduleIDs>1,2,3,4,5</ReferenceNoduleIDs>
</CancerInfo>
<Finding>
  <ID>0</ID>
  <X>-29.6</X>
  <Y>81.62</Y>
  <Z>1800.24</Z>
  <Extent>
    <ExtentX>-29.6</ExtentX>
    <ExtentY>81.62</ExtentY>
    <ExtentZ>1800.24</ExtentZ>
  </Extent>
  <Probability>0.9</Probability>
  <CancerProbability>1.91551e-10</CancerProbability>
  <Diameter_mm>-1.0</Diameter_mm>
  <Volume_mm3>-1.0</Volume_mm3>
</Finding>
</LungCADReport>
silvandeleemput commented 5 years ago

This issue seems closely related to #3 which mentioned the following concern:

We need to find out where the filtering of the nodules happens if there are less than 5 candidate boxes outputted because then, the probabilities and nodule outputs do not match

Originally posted by @cjacobs1 in https://github.com/DIAGNijmegen/bodyct-kaggle-grt123/issues/3#issuecomment-492223009

@cjacobs1 I think I found the location where the filtering (selecting the top5 candidate boxes) happens. This implies that I should be able to output at least all nodule candidates. However, the ranking is based on a confidence value which is not a probability (can be a value > 1 and maybe even < 0). There is of course also the Jaccard/IoU associated with each nodule bounding box, which is probably used for thresholding the nodules into Y/N/ignore categories.

I would like to discuss what to extract besides the center and extent of the bounding box. Are you interested in the Jaccard/IoU values only or also the confidence values?

Furthermore, what output format do we want: an XML file similar to Arnaud's nodule classification processor, or do we want the format already there and extend that with the currently missing information?

silvandeleemput commented 5 years ago

After discussion we agreed on serveral things. The decision have been moved to the top of the ticket.

cjacobs1 commented 5 years ago
  • [ ] An image-level cancer probability score based on the top5 entries. (@cjacobs1 where do we put this in the output XML report? A separate finding or in the ImageInfo group? )

Let's make a separate group for that, called CancerInfo. Also, please change Confidence into Probability. So, it will look like:

<ImageInfo>
  <all image info>
</ImageInfo>
<CancerInfo>
  <CaseCancerProbability>0.89</CaseCancerProbability>
</CancerInfo>
<Findings>
  <Finding>
    <ID>0</ID>
    <X>-29.6</X>
    <Y>81.62</Y>
    <Z>1800.24</Z>
    <Extent>
      <ExtentX>-29.6</ExtentX>
      <ExtentY>81.62</ExtentY>
      <ExtentZ>1800.24</ExtentZ>
    </Extent>
    <Probability>12.4</Probability>
    <CancerProbability>1.91551e-10</CancerProbability>
    <Diameter_mm>-1.0</Diameter_mm>
    <Volume_mm3>-1.0</Volume_mm3>
  </Finding>
</Findings>
  • [ ] Test if it is possible to generate a cancer probability score per nodule beyond the top-5
  • [ ] @cjacobs1 Sufficient information should be available to compute Diameter_mm and Volume_mm3 as well, will we compute and include those?

Not needed at this point, the segmentation performance is unclear so not sure how useful it is.

silvandeleemput commented 5 years ago

@cjacobs1 You were right, the nodule probability scores are indeed computed by applying the sigmoid function, so I propose to use those instead of the confidence scores directly.

silvandeleemput commented 5 years ago

@cjacobs1 I have updated the proposed output format, based on your feedback. The CancerInfo group is now included and I have changed Confidence to Probability (which now actually also can be a proper probability through the sigmoid) This should be sufficient to get going.

Do we want to reference the relevant nodule findings linked to the CaseCancerProbability?

cjacobs1 commented 5 years ago

@cjacobs1 You were right, the nodule probability scores are indeed computed by applying the sigmoid function, so I propose to use those instead of the confidence scores directly.

Perfect, agreed!

cjacobs1 commented 5 years ago

@cjacobs1 I have updated the proposed output format, based on your feedback. The CancerInfo group is now included and I have changed Confidence to Probability (which now actually also can be a proper probability through the sigmoid) This should be sufficient to get going.

Do we want to reference the relevant nodule findings linked to the CaseCancerProbability?

Yes, we do. Perhaps best to include the IDs of those nodules in the CancerInfo part. So, it would look like:

<CancerInfo>
  <CaseCancerProbability>0.89</CaseCancerProbability>
  <ReferenceNoduleIDs>1,2,3,4,5</ReferenceNoduleIDs>
</CancerInfo>

But if we sort them, it will always be 1,2,3,4,5, right? But still good to add, I think. For someone who is not so familiar with the algorithm.

silvandeleemput commented 5 years ago

But if we sort them, it will always be 1,2,3,4,5, right? But still good to add, I think. For someone who is not so familiar with the algorithm.

My thought exactly. I'll add it to the decision.

silvandeleemput commented 5 years ago

I am making good progress on this. There are just a few things left to do before this can be put into a PR:

Moved TODO to the top of the ticket.

silvandeleemput commented 5 years ago

General update

Unresolved/new issues:

Reporting

Performance issues:

Untested

silvandeleemput commented 5 years ago

Today, I was finally able to identify and resolve the issue with the conversion from voxel to world coordinates. The test MHD and MHA files had incorrect (mirrored) spacing and offset w.r.t. the Dicom files, also the ITK fileloader loaded the spacing and offset mirrored.

This led to the tests not failing on the test files and failing on new Dicom files. I have fixed the ITK loader and the MHD and MHA test files. In addition, I have added rigorous testing for the voxel to world coordinate conversions.

Some things remaining before the PR:

haimasree commented 5 years ago

Does this mean it was not an issue with convert_voxel_to_world_coordinates.py, but the input test files? If indeed the above code is faulty, can I see line numbers which may be contributing to the failure? @silvandeleemput

silvandeleemput commented 5 years ago

Does this mean it was not an issue convert_voxel_to_world_coordinates.py, but the input test files?

@haimasree Yes, there was no fault in your WorldToVoxelConvert class. So far your module holds fine against my tests, but I'll still need to test transform matrices. The fault was indeed in the input test files, which probably also led to an incorrect implementation of the itk image fileloader. My apologies for the unwarranted callout earlier.

haimasree commented 5 years ago

Oh no worries at all. Like I said before, I never tested with mhd/mha files and fixed that fairly quickly which didnt allow me to do as much testing as I would have liked. Hence, I was curious. Good work!