arttumiettinen / pi2

C++ library and command-line software for processing and analysis of terabyte-scale volume images locally or on a computing cluster.
GNU General Public License v3.0
48 stars 13 forks source link

Watershed segmentation of particles #18

Open ivonindima opened 1 month ago

ivonindima commented 1 month ago

Hello! Do you have any plans to implement a distributed computing mode for the following functions: localmaxima(), cleanmaxima(), labelmaxima(), grow()? This would significantly speed up the process of watershed segmentation of particles. Thanks a lot!

arttumiettinen commented 2 weeks ago

Greetings from the summer holidays and sorry for the delay in replying,

Thanks for the suggestion. In principle, I'm interested in making distributed versions of all the functionalities. In this case, the localmaxima function is non-trivial to convert to distributed processing mode: it defines a local maximum as a connected region of pixels with the same pixel value, where all the neighbouring pixels have smaller pixel values. Because of this, the local maximum might span almost the entire image. The normal "process in smaller blocks + handle block edges separately" distribution strategies don't work directly in this case, and the distributed processing would be quite similar to what analyzeparticles does. I need to think about the ways to go around the problem.

How large are your images? Would you like to have a distributed computing possibility solely for processing speed, or do you need it to process larger images? Do you have a size limit for the maxima (number of pixels in maximum region, diameter, etc.)?

Best, Arttu

ivonindima commented 2 weeks ago

Hello and thanks for the reply!

I work with 3D images obtained with a microtomograph. Their typical size is 2000x2000x4000 voxels. For desktop processing, I select a zone of interest of 1000x1000x1000 voxels. On a workstation, the full image can also be processed with pi2. First of all, I would like to speed up the watershed segmentation process as much as possible at such sizes.

I did a small research with the profiler, which showed that the main computing time is taken by the localmaxima and grow functions in a 20/80% ratio. Both of these functions work in single-thread mode (the CPU load does not exceed 10%), while the dmap calculation takes a few seconds at 100% CPU load.

I attach an example image (one slice from a 3D array). The quality of the separation is excellent and I am looking for a way to speed up the computing process. In previous research, we used Dask to solve same problem (da.overlap.overlap and da.overlap.trim_internal). But other libraries can't provide similar performance to pi2, but pi2 is not compatible with Dask.

Hope this information is useful.

screenshot

arttumiettinen commented 1 week ago

Hi,

Thanks for more information. This changes things a lot! For the image size you are dealing with, it is probably not worth it to bother with distributed processing. At least not before the bottleneck you have identified, i.e. the grow command, is properly optimized and parallelized.

Indeed, there has been some work towards a faster grow command in the past. Going through the old code and finishing it should not be a big task. I'll check what we have already, and let you know in the coming days when there is something to test.

Best, Arttu