DIAGNijmegen / rse-panimg

Conversion of medical images to MHA and TIFF.
Apache License 2.0
13 stars 5 forks source link

MHA re-writing takes double the memory of the original image #110

Closed jmsmkn closed 1 month ago

jmsmkn commented 1 month ago

Is there a way to stream the re-writing of the image data in mha conversion?

chrisvanrun commented 1 month ago

We could do something similar to what we do with CIRRUS for very large MHA files:

https://github.com/DIAGNijmegen/rse-cirrus-core/blob/ec25e5738260f7edf5eb63457acac4ddb7baacc7/src/cirrus/DIAG/CIRRUSCore/Modules/Scripts/python/cirrus/data/image/mevislab/metaio_utils.py#L66

Depending on the builder and its function we might just copy over the raw data directly, after cleaning up the headers. Depends on if we do any data manipulation.

jmsmkn commented 1 month ago

The only change (IIRC) is that any uncompressed data is compressed. I think Paul looked at this before and it turned out SimpleITK was not doing this efficiently, but cannot find the details.

jmsmkn commented 1 month ago

Actually doing the reverse operation of that CIRRUS method seems not too bad to maintain.

chrisvanrun commented 1 month ago

A quick experiment, using /usr/bin/time --verbose to keep track of RSS spikes tells me that just using SimpleITK to read a nifti format doubles the memory to what you'd expect.

Re-reading the created .mha with the same script shows a memory spike which is around the same size as the uncompressed image.

So likely the ITK ImageIO used is not efficient, or maybe it cannot be efficient.

jmsmkn commented 1 month ago

Nifti we shouldn't care about here. They can be told to go away and do the conversion themselves offline (using this).

This is only about our MHA/MHD -> validated compressed MHA route.

jmsmkn commented 1 month ago

Having a spike of the size of the uncompressed image is okay - we need to check if it can be successfully loaded, but at the moment we max out at only being able to import 4GB MHA files on 8GB instances.

jmsmkn commented 1 month ago

Relevant support request for Fargate swap support https://github.com/aws/containers-roadmap/issues/2061

chrisvanrun commented 1 month ago

Having a spike of the size of the uncompressed image is okay - we need to check if it can be successfully loaded, but at the moment we max out at only being able to import 4GB MHA files on 8GB instances.

That makes sense. We need to load it at least once! Is the source of the double memory peak clear?

Doing MHA -> MHA only seems to use the uncompressed-image size as max RSS. When using an uncompressed 5.8 GiB image in a MHA -> MHA I only see ~5.8 GiB RSS max locally without compression and ~6.9 GiB when compression is turned on.

Extrapolating this down to 4GB MHA files we would use ~6GB for writing. Would that sufficiently explain failure on an 8GB instance, that is... taking (OS) system RAM usage into account?

jmsmkn commented 1 month ago

I made this issue as the worker choked on a 4.8GB uncompressed MHA using an 8GB instance. OS stuff we do not need to worry about as this is running on Fargate. The celery worker isn't heavy enough to explain the difference I think - it idles at 5% of memory (400MB).

chrisvanrun commented 1 month ago

Strange. Taking an expected 19% compression overhead on a 4.8GB image + 400MB would still only push things to 6.1GB, well short of the 8GB. Maybe it is a difficult file to compress and the overhead is somehow different.

Then again, why else would it choke, so it's likely the compression that is the culprit here. Chunking that could help.

jmsmkn commented 1 month ago

Problem is within GC, continued in https://github.com/DIAGNijmegen/rse-grand-challenge-admin/issues/310