Localizers cause confusion with dicomtocsv output

dgobbi commented 1 year ago

Some DICOM series, particularly reformats, have a localizer as the first image in the series. This can cause problems for dicom tools that report the metadata from the first image as being representative of the series as a whole. In particular, dicomtocsv.

Not all tools are afflicted. The dicomtonifti tool, for example, automatically discards the localizers.

For dicomtocsv, there are already some options to scan multiple files in the series:

  --first-nonzero   Search series for first nonzero value of each key.
  --all-unique      Report all unique values within each series.
  --min-value       Report the minimum value within each series.
  --max-value       Report the maximum value within each series.

However, we don't want a new option to deal with localizers, we want dicomtocsv to ignore them by default when doing series-level reporting (unless every image in the series is a localizer).

The tricky part is, how to automatically identify the localizers. The "ImageType" attribute cannot do this reliably. In dicomtonifti, the localizer is identified as having a different orientation than the rest of the series.

Instead of focusing specifically on localizers, dicomtocsv could check for consensus among the files in the series when reporting attributes, though this might be too expensive to be the default.

dgobbi commented 1 year ago

More thoughts: To allow users to revert to the old behavior, this option should be present:

  --first           Always report from the first file in the series.

I can even subtext this with (faster) if it really is a faster way to get information.

For "consensus", one approach is to build a map (value : frequency) where "value" is the attribute value as a string. The most frequent value wins. But what about a tie? The value from the earlier (lower instance number) image should be preferred. This suggests a vector might be better than a map. A vector might be more efficient, too, since "runs" are common so comparing against the previous value is a good strategy.

In fact, "consensus" could be based on the value with the longest run, rather than the value with the highest frequency. The implementation can be very efficient (neither a map or a vector is needed) and it provides the right semantics for rejecting localizers, since the non-localizer images always exist as a contiguous run.

dgobbi commented 1 year ago

I tried when I mentioned in the previous message (branch dicomtocsv-consensus) and now it's apparent that choosing the longest run doesn't always give the the desired value. For example, for reporting image-specific times like ContentTime and SeriesTime, we'd rather have the first (or earliest) time rather than get one of the times that was the same for multiple images. The takeaway is, when reporting image-specific value for a series-level query, the optimal strategy varies from attribute to attribute (and from situation to situation, as well).

So, back to the beginning: how to ignore the localizer when reporting attribute values.

dgobbi commented 1 year ago

Branch dicomtocsv-ignore-locators (baff1f8c4) uses changes in ImageOrientationPatient to identify locators, and (so far) it seems to work well. In order expand this idea to other dicom tools (for example, dicompull) the best way forward is to implement this fix using vtkDICOMSliceSorter, which is used by dicomtonifti and by the vtkDICOMReader itself.

dgobbi commented 11 months ago

Closed, using the fix mentioned above (final commit: eafa799).

dgobbi / vtk-dicom

Localizers cause confusion with dicomtocsv output #221