Open gustavo-alberto opened 1 month ago
Memory management is also critical for my use case. My hopes is to be able to make a mosaic file size which is greater than what is normal to be had in RAM. I've done some preliminary testing saving Big TIFF files externally from Stitch2d by the use of np.memmaps. From what I understand, np.memmaps should allow for all of the feature matching and other calculations to remain the same so long as the images are loaded into this structure.
I'm currently in the process of making a new Tile class to store data from a TIFF with np.memmaps instead of np.ndarray. I imagine the saving of the stitched mosaic might have some tricky element to it but I will report back. Please let me know if you make any parallel progress.
@afong3
I didn't know memmaps. I've been thinking about the possibility of working with non-relational databases like MongoDB. MongoDB supports concurrent read/write operations with multiprocessing, effectively avoiding file concurrency issues. The idea is to store tile-related information (matching points, descriptors, etc.) in MongoDB rather than holding everything in memory during runtime.
Proposed Process Structure:
Database Structuring:
Image Loading and Unloading:
Gradual Processing:
State Maintenance and Checkpoints:
My hardware setup includes an i5 14th generation CPU, 32GB DDR5 RAM, and an RTX 3050 GPU. Despite having sufficient resources, memory usage hits 100% due to attempting to load all images into memory. My test dataset comprises approximately 8000 images, each 1600x1600 pixels.
Cluster Processing:
To optimize memory usage, I propose processing a fixed number of images per cluster (e.g., 9 images per cluster). Each cluster can be independently processed by the processor. Limiting the number of clusters processed concurrently will help manage memory usage. Additionally, utilizing the GPU for parallel processing could significantly speed up the process.
Challenges:
Blank Images: Currently, blank images without matching points cause errors in processing. I suggest a brief preprocessing step to identify blank images and approximate their placement in the mosaic based on known x and y positions for filling the area.
Manual Cluster Handling: When manually handling clusters (assembling two sub-mosaics and then merging them), the script fails, possibly due to differing sub-mosaic sizes. This needs to be addressed.
The main challenge lies in structuring everything and designing classes and methods for each part of the process, following certain design patterns.
@gustavo-alberto
Interesting idea with MongoDB - I'm curious how it'll turn out.
I went ahead with the np.memmaps and have been able to successfully stitch 29GB of overlapping images (2,500 tiff images) and wrote a 15GB file. The resulting image has a shape of 62328, 79244, 3. My machine has a 13th Gen i9 processor and 16GB RAM. Alignment took approximately 18 minutes in a Jupyter notebook.
I edited the base Tile class which would I would need to make more elegant before sending a PR. I also have changed some of the logic necessary to use np.memmaps such as file saving by chunks. I won't have time in the next few days to make any clean code changes to send for a PR but eventually I likely will.
I'm not sure how this would act with as many images as you're hoping for but this does prove your assumptions that you can stitch and save a mosaic which sums to more data than you have RAM.
I am working on constructing a mosaic from photographs taken of a microscope slide. The total number of images is approximately 10,000, some of which are completely white, indicating areas with no tissue.
Running the script with either of the two options (
Mosaic
andStructuredMosaic
) results in a memory management issue, causing the memory usage to reach 100% and freezing the computer.I am looking for a way to work with the script using clusters to manage memory and recursively assemble the clusters until the entire mosaic is constructed. Additionally, I need to incorporate the white images (which do not have matching points) into the clusters based on their known positions.
Here is the detailed scenario:
Requirements
Example
The script should be able to:
Is there a way to achieve this functionality with the current script or through modifications?