Closed jcupitt closed 1 year ago
Thanks for the deep dive and detailed analysis @jcupitt. Unfortunately, large multi-frame images with TILED_SPARSE dimension organization are a pain to work with and we have problems with it at multiple fronts. See for example https://github.com/ImagingDataCommons/highdicom/issues/202.
Having a highly performant C implementation would thus be extremely useful.
cc @CPBridge @dclunie @pieper
Yes, I've been using fo-dicom (a nice C# dicom loader), but it also really struggles with TILED_SPARSE
slide images. It can take a minute or so to load a large one.
Leica assume me they are planning to support TILED_FULL
, so ... fingers crossed. I think their main issue is with row-major vs column-major tile layout. Writing a BOT would help too, of course.
I've started an experimental performance branch to see if I can get libdicom running a bit quicker.
On a 720mb Leica DICOM, git master libdicom takes about a second to parse the file::
Optimisation ideas in no order:
strcmp()
, according to callgrind. We could swap out the string VRs for an enum.dcm_offset()
(get current file pointer) flushes the input buffer unnecessarily, we could add a special path for this. Though it doesn't seem to be called in the main parse.int_malloc
andint_free
, suggesting there are many small mallocs. Some of these could probably be moved to the stack.eheader_check_vr()
, which should not really exist.So I suppose there might be a 20% speedup possible.