InsightSoftwareConsortium / ITK-Wasm

High performance spatial analysis in a web browser and across programming languages and hardware architectures
https://wasm.itk.org
Apache License 2.0
196 stars 50 forks source link

Consider adding dicom package tests that would utilize selected samples from IDC #1208

Open fedorov opened 2 months ago

fedorov commented 2 months ago

Since all of the IDC data is available in public buckets, with the content available via S3 API or HTTPS, without authentication, it might be good to add regression tests that utilize hand-picked DICOM samples that stress specific aspects of the functionality.

Specific examples that we already ran into, with SeriesInstanceUID of a corresponding sample from the current IDC v18 data release:

Given the UID above, the corresponding file(s) can be retrieved in just 2 steps:

  1. $ pip install --upgrade idc-index
  2. $ idc download <SeriesInstanceUID>

Other dimensions we may want to consider testing could include various transfer syntaxes, diffusion images from different manufacturers, series with missing slices, series with inconsistent PixelSpacing or ImageOrientationPatient, gantry tilt, presentation states, various samples that contain attributes that are invalid per standard, but may be encountered "in the wild". I think we should be able to find samples for many situations that need to be regression-tested.

I have not done this myself, but looks like CMake supports such external data sources: https://cmake.org/cmake/help/book/mastering-cmake/chapter/Testing%20With%20CMake%20and%20CTest.html#managing-test-data.

I am happy to help with selection of the relevant samples for the tasks we agree should be tested and answer any questions related to IDC.

I think something like the above has been a dream of @pieper for many years now. I believe we finally can make it come true!

fedorov commented 2 months ago

This is where tests are right now and it seems they are propagated from dcmqi: https://github.com/InsightSoftwareConsortium/ITK-Wasm/blob/main/packages/dicom/dcmtk/CMakeLists.txt#L88

jadh4v commented 1 month ago

This is where tests are right now and it seems they are propagated from dcmqi: https://github.com/InsightSoftwareConsortium/ITK-Wasm/blob/main/packages/dicom/dcmtk/CMakeLists.txt#L88

@fedorov

The tests in the CMake file are mostly run for sanity check on native binaries. The more comprehensive typescript and python tests are here: https://github.com/InsightSoftwareConsortium/ITK-Wasm/tree/main/packages/dicom/typescript/test https://github.com/InsightSoftwareConsortium/ITK-Wasm/tree/main/packages/dicom/python/itkwasm-dicom-wasi/tests

Also, GDCM is available through both image-io as well as the dicom subpackage for reading image series. I don't believe DCMTK is currently being used for reading imaging modalities of dicom series (@thewtex correct me if I'm wrong).

fedorov commented 1 month ago

Yes, I understand. The idea is to augment the existing tests of dcmqi (which are basically small toy examples) with the tests on the real data from IDC, and also add tests of the image-io package using data from IDC. No need to add DCMTK to image-io for this purpose, but just improve testing of the existing GDCM-based functionality.