ImagingDataCommons / libdicom

C library for reading DICOM files
https://libdicom.readthedocs.io
MIT License
15 stars 7 forks source link

Parser performance #40

Closed jcupitt closed 1 year ago

jcupitt commented 1 year ago

On a 720mb Leica DICOM, git master libdicom takes about a second to parse the file::

$ time dcm-getframe -v -o x.jpg 1.3.6.1.4.1.36533.992031277215326518414019311377114211255162.dcm 1
INFO     [Thu Feb 16 10:33:49 2023] - Read file '1.3.6.1.4.1.36533.992031277215326518414019311377114211255162.dcm'
INFO     [Thu Feb 16 10:33:49 2023] - Read metadata
INFO     [Thu Feb 16 10:33:51 2023] - Read BOT
INFO     [Thu Feb 16 10:33:51 2023] - Basic Offset Table is empty.

real    0m1.355s
user    0m0.805s
sys 0m0.549s

Optimisation ideas in no order:

So I suppose there might be a 20% speedup possible.

hackermd commented 1 year ago

Thanks for the deep dive and detailed analysis @jcupitt. Unfortunately, large multi-frame images with TILED_SPARSE dimension organization are a pain to work with and we have problems with it at multiple fronts. See for example https://github.com/ImagingDataCommons/highdicom/issues/202.

Having a highly performant C implementation would thus be extremely useful.

cc @CPBridge @dclunie @pieper

jcupitt commented 1 year ago

Yes, I've been using fo-dicom (a nice C# dicom loader), but it also really struggles with TILED_SPARSE slide images. It can take a minute or so to load a large one.

Leica assume me they are planning to support TILED_FULL, so ... fingers crossed. I think their main issue is with row-major vs column-major tile layout. Writing a BOT would help too, of course.

I've started an experimental performance branch to see if I can get libdicom running a bit quicker.

jcupitt commented 1 year ago

OK, implemented with:

https://github.com/ImagingDataCommons/libdicom/pull/43

It's about 20% faster.