dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
8 stars 2 forks source link

Scale postprocessing #322

Open dchaley opened 2 months ago

dchaley commented 2 months ago

This one is hard!

The algorithm doesn’t divide into neat tiles. Pixel values can “spread” from tile to tile.

Naive / inefficient algorithm:

We could do something similar, processing each tile until it converges. If a tile pushes to its neighbors, those would need to be reprocessed as well.

Something like:

dchaley commented 3 weeks ago

After improving h_maxima (called by deep_watershed) in #350 there remains a lot of postprocessing time spent in fill_holes, approximately 14% for 140M pixels: snakeviz for postprocessing 140M pixel image

and approximately 40% for 260k pixels: speedscope for postprocessing 260k pixel image

Note that the larger image was profiled with cProfile which has some accuracy limitations particularly for very short function calls (< 0.001s) and very many calls. We have both of these situations … shown very clearly for the 260k-pixel image.

Speedscope couldn't handle the 140M pixel image: it generated 4GB of profiling json 🤯

We did spot a potential improvement, see #357