ImagingDataCommons / highdicom

High-level DICOM abstractions for the Python programming language
https://highdicom.readthedocs.io
MIT License
172 stars 37 forks source link

Slow performance reading SEG with 98 segments #202

Closed pieper closed 1 year ago

pieper commented 1 year ago

I have this notebook testing the time to run the seg example on a file with ~100 segments. It takes over 8 minutes to load this file and print the simple metadata.

Is this expected? Can we expect any optimization as the library matures? Working with the same data in a research format such as nrrd is orders of magnitude faster.

@hackermd @fedorov

pieper commented 1 year ago

Here is the notebook:

https://colab.research.google.com/drive/1ZLqJwDIO1XKnnjOzClkSq8RIawm3sp9M?usp=sharing

CPBridge commented 1 year ago

Hi @pieper

The iter_segments was an early solution for accessing segmentation pixels, but has largely been replaced by a set of methods on the Segmentation class itself. Unfortunately this has not yet made it into the User Guide yet, though I do have a branch on it. It is fairly well documented in the API docs for the segmentation class however. Iterating through segments is not efficient nor convenient in many situations. Would something like this be more suited to your use case? This runs considerably faster for me (though there may still be room for performance gains):

import pydicom
import highdicom as hd

seg = hd.seg.segread("/tmp/ABD_LYMPH_008_SEG.dcm")
print('read file')

source_uids = seg.get_source_image_uids()
source_sops = [uids[2] for uids in source_uids]

pix = seg.get_pixels_by_source_instance(
    source_sops,
    ignore_spatial_locations=True,
    #segment_numbers=[1,2,3]  # you could choose specific segments if you like
)

print(pix.shape)

There are also various methods to identify which segments you need ahead of calling this method (if this is useful). See the get_segment_numbers method

Not saying we couldn't make iter_segments more efficient, but wondering whether there may now be a better fit for what you are actually trying to achieve.

pieper commented 1 year ago

Thanks for the insights @CPBridge! For context, we are looking at how to optimize the import of SEG in Slicer. We have been using dcmqi, which works very well, but takes about 4 minutes for this dataset and since we wish to support many more segments in the future we wanted to explore highdicom's functionality.

There would be some Slicer programming required, so before digging into that I wanted to see how fast highdicom might be for this use case.

I updated my colab notebook with both approaches and some timings. When I try highdicom on native Slicer your new approach is down to about 20 seconds to get all 98 segments, so it should be a big improvement over what we are currently doing.

Will come back to you as we get more practical experience.