fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Cellpose task: Output ROI table creation performance scaling #764

Open jluethi opened 2 weeks ago

jluethi commented 2 weeks ago

I'm observing in a user experiment at FMI that the creation of output ROI tables seems to slow down the task the more ROIs are processed in an image.

The user has a big well with 138 organoid objects and is running Cellpose per organoid object. The early objects took a few minutes to process.

2024-06-13 15:57:24,290; INFO; Now processing ROI 1/138
...
2024-06-13 15:57:30,901; INFO; ROI [0, 18, 68, 124, 5272, 5332], num_labels_roi=19, num_labels_tot=19
2024-06-13 15:57:30,923; WARNING; 65 bounding-box pairs overlap
2024-06-13 15:57:31,017; INFO; Now processing ROI 2/138
...
2024-06-13 15:57:47,218; INFO; ROI [0, 18, 1244, 1524, 3844, 4128], num_labels_roi=986, num_labels_tot=1005
2024-06-13 15:58:27,429; WARNING; 12221 bounding-box pairs overlap
2024-06-13 15:58:27,527; INFO; Now processing ROI 3/138

Later objects were much slower to process (~20 min):

2024-06-14 10:34:50,335; INFO; Now processing ROI 123/138
...
2024-06-14 10:35:03,566; INFO; ROI [0, 18, 16996, 17272, 1636, 1912], num_labels_roi=325, num_labels_tot=71118
2024-06-14 10:54:45,299; WARNING; 410926 bounding-box pairs overlap
2024-06-14 10:54:45,719; INFO; Now processing ROI 124/138
...
2024-06-14 10:54:58,671; INFO; ROI [0, 18, 17052, 17288, 13524, 13752], num_labels_roi=515, num_labels_tot=71633
2024-06-14 11:14:45,451; WARNING; 415582 bounding-box pairs overlap
2024-06-14 11:14:45,770; INFO; Now processing ROI 125/138

The ROI sizes appear to have a similar order of magnitude. But for later organoids, num_labels_tot=71118 is much higher and it looks like we get very many overlap warnings in 3d: WARNING; 410926 bounding-box pairs overlap.

=> Does ROI table creation rerun for all labels when an organoid processing is finished? Anything else that would explain this? I'll need to look closer into it, just wanted to report the logs here for the time being

tcompa commented 2 weeks ago

I don't think this has to do with table creation, but with overlap checks.

The get_overlapping_pairs_3D is potentially costly, due to its quadratic scaling with the number of elements. And I think we are calling it in the wrong way:


    for i_ROI, indices in enumerate(list_indices):
        # ...
        if output_ROI_table:
            bbox_df = array_to_bounding_box_table(
                new_label_img,
                actual_res_pxl_sizes_zyx,
                origin_zyx=(s_z, s_y, s_x),
            )

            bbox_dataframe_list.append(bbox_df)

            overlap_list = []
            for df in bbox_dataframe_list:   # <--------- see here
                overlap_list.extend(
                    get_overlapping_pairs_3D(df, full_res_pxl_sizes_zyx)
                )
            if len(overlap_list) > 0:
                logger.warning(
                    f"{len(overlap_list)} bounding-box pairs overlap"
                )

I think we have two issues in the code above:

  1. We create the whole list of overlaps but then just print its length --> we should only return an integer from get_overlapping_pairs_3D, and then sum all lengths
  2. For each i_ROI, we reconstruct the whole list of overlaps - including the ones corresponding to other values of i_ROI (see "see here" comment in code). At a first look, this is just wrong.

I see at least two easy solutions, even without touching point 1 above. They are both

(A) We move the for df in bbox_dataframe_list block outside the i_ROI loop, and only run it once at the end of the loop.

(B) We apply a patch like

--- a/fractal_tasks_core/tasks/cellpose_segmentation.py
+++ b/fractal_tasks_core/tasks/cellpose_segmentation.py
@@ -640,11 +640,7 @@ def cellpose_segmentation(

             bbox_dataframe_list.append(bbox_df)

-            overlap_list = []
-            for df in bbox_dataframe_list:
-                overlap_list.extend(
-                    get_overlapping_pairs_3D(df, full_res_pxl_sizes_zyx)
-                )
+            overlap_list = get_overlapping_pairs_3D(bbox_df, full_res_pxl_sizes_zyx)
             if len(overlap_list) > 0:
                 logger.warning(
                     f"{len(overlap_list)} bounding-box pairs overlap"

A couple more details: