ENH: Improve 4D cardiac cine-MRI import

lassoan commented 3 years ago

examineFilesIPPInstanceNumber strategy is now attempted for each series, not just the "allFiles" merged data set, because this allows loading of some Philips 4D cardiac cine-MRI.

Also made a few small fixes:

harden any acquisition transform (e.g., irregular spacing)
added a bit more geometry check (check that image orientation is constant and slice positions are not repeated)
generate loadable name and tooltip using a common helper function (results in less code and more consistent naming)

@fedorov Two questions:

Do you have example data sets that I could use to test the examineFilesMultiseries, examineFilesIPPAcqTime, and examineFilesIPPInstanceNumber strategies with "allFiles" (all series merged)?
It is not clear for me why the MultiVolumeImporter and MultiVolumeExporter modules are hosted in separate repositories. What do you think about moving them into the Slicer core?

fedorov commented 3 years ago

@lassoan thank you for the contribution! Is this urgent? If yes, I don't have objections to merge it right away - I most definitely trust you with any code modifications related to Slicer, but if not - I would need a bit more time to review.

Do you have example data sets that I could use to test the examineFilesMultiseries, examineFilesIPPAcqTime, and examineFilesIPPInstanceNumber strategies with "allFiles" (all series merged)?

I will see if I can search and if I can find any such data in IDC. All the data that I can share has been shared on TCIA and is now in IDC, and we might be able to query specific series that have the properties we need.

It is not clear for me why the MultiVolumeImporter and MultiVolumeExporter modules are hosted in separate repositories. What do you think about moving them into the Slicer core?

I don't recall the reasons why they are in different repositories. Maybe because one was initiated before the other, and I didn't think about renaming, or maybe Slicer infrastructure back in the days could not accommodate DICOM plugins and scripted modules in the same repo? The history of this repo dates back to 2012!

I do not have any objection merging this into Slicer core and deprecating these separate repos. If this happens, I would prefer for the commit history to be preserved, but I am not sure this would be possible.

lassoan commented 3 years ago

It is not urgent, so it is no problem to wait for your review. However, I've asked for test data, as I think as we add more and more nuanced logic to support new data sets, it is becoming increasingly difficult to ensure that there are no regressions just via code reviews.

Probably your time would be better spent on identifying a few typical data sets and the known tricky data sets and I could set up some automatic tests to ensure that they are all still loaded correctly.

I'll look into porting these modules into Slicer with preserving the history. I've added this ticket to track this: https://github.com/Slicer/Slicer/issues/5935

fedorov commented 3 years ago

Here's the first pass on a query that identifies all series that have more than one instance per ImagePositionPatient (converted to string). There are 272929 series that match this query! So need to add more restrictions to select those that match specific strategies, and test how loading works in Slicer. WIP...

WITH
  series_counts AS (
  SELECT
    STRING_AGG(DISTINCT(collection_id),",") AS collection_id,
    STRING_AGG(DISTINCT(Modality),",") AS Modality,
    COUNT(DISTINCT(SOPInstanceUID)) AS instance_count,
    COUNT(DISTINCT(ARRAY_TO_STRING(ImagePositionPatient,"/"))) AS position_count,
    STRING_AGG(DISTINCT(SeriesDescription),",") AS SeriesDescription,
  FROM
    `bigquery-public-data.idc_current.dicom_all`
  GROUP BY
    SeriesInstanceUID)
SELECT
  *,
  instance_count/position_count AS instances_per_position
FROM
  series_counts
where position_count > 1
ORDER BY
  instances_per_position DESC

lassoan commented 1 year ago

Another case has come up that improvements in this PR fix. So, if you don't have any objections, it would be great if it could be integrated.

fedorov commented 1 year ago

@lassoan I merged the PR. I am sorry, I completely forgot about this issue. So I revisited the query, and now I select one series that has more than one instance per ImagePositionPatient, and take a sample of one series for each combination of collection, modality and body part examined.

WITH
  series_counts AS (
  SELECT
    SeriesInstanceUID,
    ANY_VALUE(collection_id) AS collection_id,
    ANY_VALUE(Modality) AS Modality,
    ANY_VALUE(BodyPartExamined) AS BodyPartExamined,
    ANY_VALUE(CONCAT(Manufacturer," ",ManufacturerModelName)) AS manufacturerWithModelName,
    COUNT(DISTINCT(SOPInstanceUID)) AS instance_count,
    COUNT(DISTINCT(ARRAY_TO_STRING(ImagePositionPatient,"/"))) AS position_count,
    STRING_AGG(DISTINCT(SeriesDescription),",") AS SeriesDescription,
  FROM
    `bigquery-public-data.idc_current.dicom_all`
  GROUP BY
    SeriesInstanceUID
  HAVING
    position_count>1
    AND instance_count/position_count>1),
  combinations_numbered AS (
  SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY collection_id, Modality, BodyPartExamined) AS rn
  FROM
    series_counts)
SELECT
  *
FROM
  combinations_numbered
WHERE
  rn = 1

Here's the initial rows of the result (complete list is attached):

bquxjob_4d53151_189b7d571dc.csv

If you are interested to use those samples for testing, let me know and I can give pointers how to download the underlying files. I can also refine the selection, if you tell me what criteria you would like to use, and I can give instructions how you can play with those queries to define selection yourself.

fedorov / MultiVolumeImporter

ENH: Improve 4D cardiac cine-MRI import #49