gerlichlab / looptrace

Fork (from EMBL Gitlab) of the looptrace project: https://git.embl.de/grp-ellenberg/looptrace
MIT License
2 stars 1 forks source link

ImageHandler unnecessarily reads extra images? #9

Open vreuter opened 1 year ago

vreuter commented 1 year ago

From some initial use and looking at logging output, it appears that constructing an ImageHandler and then using it for some step of the processing (e.g., nucleus detection, drift correction, spot detection) can result in unnecessary IO time and memory use because of extra image file parsing. "Extra" in the sense that there are some image files in subfolders of a main data folder that appear to have image files that are parsed even when they're not going to be used. That is, ImageHandler is very eager and general purpose, parsing images that it can "see" from the image_path value that's in use, while in fact it could be made lazier or we could modify the program and data type architecture to ensure that we do minimal IO work and reduce memory usage.

vreuter commented 1 year ago

Currently, we can rename the raw image folders after deconvolution and nucleus detection, prefixing them with an underscore so that they're skipped by the image handler (discarded by its filter function -- ). This is nice, but we could make this less manual and less effectful on the file system.

Here's a rough outline of improvements we could do, increasing by difficulty and by soundness of the design.

  1. Facilitate automatic renaming with underscore prefix after nucleus detection. This could be optional / a pipeline step. Default yes, opt-out
  2. Ignore these locations (we know them from the config)
  3. Redesign architecture so that ImageHandler isn't constantly looking in a centralised image location for each step.