PreibischLab / BigStitcher

ImgLib2/BDV implementation of Stitching for large datasets
GNU General Public License v2.0
67 stars 14 forks source link

Fast_Translation_Fusion runs out of memory #102

Open MichaelSNelson opened 3 years ago

MichaelSNelson commented 3 years ago

Our group is trying to stitch whole slide images (larger than Java array size limit) using BigStitcher, but we are having problems with time and/or memory scalability:

Time performance. BigStitcher takes a very long time due to an N^2 issue when running on many tiles. (Details at ctrueden/tasks#11)

Memory performance. So, we tried using the experimental Fast_Translation_Fusion plugin (adding it to plugins.config locally), but we run out of memory due to use of CellImg for all input and output images.

This issue here is about the latter: memory performance of Fast_Translation_Fusion. Here is what we have learned so far in our investigation:

Do you think any of these problems could be solved on the BigStitcher side? We are now running up against limitations in our knowledge and understanding of the BigStitcher code. With @ctrueden we could potentially dig more, but it is time consuming and we thought we would ask you first here if you know how to address this?

Please let me know if you could use any further information or explanations. I am working on sharing our test data set (the ~9GB of tif files), but wanted to first absolutely verify that it is ok to share :)

Cheers, Mike

MichaelSNelson commented 3 years ago

Adding links to two data sets, converted to composite from RGB:

  1. A small 600MB linear string of 24 images with 10% overlap, I used it to determine that the unedited fast_translation_fusion requires 8GB of memory to succeed with this data set, while restricting the JVM to 7GB results in an OOM failure.
  2. A larger data set, 360 3000x3000 image with 10% overlap representing the target size image.

In case it helps, links to the test code where the above mentioned changes were made (thanks @hinerm). https://github.com/MichaelSNelson/BigStitcher/tree/fast-fuse-in-menu https://github.com/MichaelSNelson/multiview-reconstruction/tree/disk-cached

hinerm commented 3 years ago

@tpietzsch we were looking at a heap dump and noticed this cyclical relationship in SoftRefLoaderRemoverCache's PhantomReference.. is there a chance this could cause a memory leak with phantom refs being unable to be GC'd?

image

tpietzsch commented 3 years ago

@hinerm No, that should all be fine. (If it wasn't this would have exploded a long time ago...)

First, cyclic dependencies are not a problem for GC, if the whole cycle is unreachable it will be collected.

Second, the PhantomRef "watches" the value (i.e. Cell in CachedCellImg), not the Entry. This cycle doesn't matter for that, the value is not strongly referenced anywhere in the cycle. When value is GCed, the PhantomRef is enqueud, and the entry is removed here: https://github.com/imglib/imglib2-cache/blob/6c2ec9e37acb8d4f89cc9a34349141bd228381d9/src/main/java/net/imglib2/cache/ref/SoftRefLoaderRemoverCache.java#L284-L300. This makes the whole cycle unreachable.