There is a problem with cell segmentation when predicting large images

Transformer-man commented 1 year ago

When I made a prediction, I found that the cell detection in the oversampled area, that is, the border area of repeated sampling, seems to be a bit problematic.

FabianHoerst commented 1 year ago

We are aware of this bug. Unfortunately, there was an error merging our branches. We Will fix it soon.

FabianHoerst commented 1 year ago

Please pull the latest update, we fixed the merging problems.

Could you please give me an update if everything is fixed now? Otherwise, I need to investigate further.

FabianHoerst commented 1 year ago

The problem is not totally gone. We are currently working for a solution, but cannot offer a fix until next week. An intermediate solution to limit the problem (but not resolving in all cases) would be to set the threshold in

if (
    query_poly.intersection(inter_poly).area
    / query_poly.area
    > 0.5
    or query_poly.intersection(inter_poly).area
    / inter_poly.area
    > 0.5
):

for both conditions from 0.5 to 0.01.

We are working on a fix next week.

FabianHoerst commented 1 year ago

We are aware of this problem. Our first fix did not removed all duplicated cells. We have now added a quick fix, which we need to improve next week to increase performance again. Could you please pull the latest version and check if it is working now?

Transformer-man commented 1 year ago

thanks for your help. I try the latest code

FabianHoerst commented 1 year ago

@Transformer-man Has it been fixed?

Transformer-man commented 1 year ago

Sorry, there have been some delays lately. Very good job, this problem has been solved by you. Thank you very much.

Transformer-man commented 1 year ago

In addition, I can ask you how to do your puzzle for 1024 1024 pieces with 64 64 overlapping areas?

FabianHoerst commented 1 year ago

Can you please be more specific? Hard to understand your question.

Transformer-man commented 1 year ago

Sorry, maybe I didn't express the problem clearly enough. This means that a large image needs to be cut into 1024 1024, and then the pixel value of the overlapping area is 64 64. For example, now there is a large image that needs to be divided into 4 parts. How did you merge the predicted results of these four large images and finally generate the predicted results of the entire large image, so that the edge part looks quite perfect, not like it was cut.

FabianHoerst commented 1 year ago

We used a heuristic approach that divided the cells into mid-cells (not in the edge parts) and in the edge parts. For each cell in the edge parts, we checked if the edge touches the border. If the edge cell touches the border, we select the matching cell of the neighboring patch and remove the border cell. If there is no matching cell from a neighboring patch, we are keeping the cell touching the border. For cells in the edge area (64px wide overlap area) not touching the border, we build an R-Tree to find all overlapping cells and merge these predictions. There is still room for improvement, but this seems to be the fastest option right now for us and gives a good tradeoff between run-time and prediction accuracy.

Transformer-man commented 1 year ago

Thank you very much for your answer. I will read your code to gain a deeper understanding. Because I don't think there is a good solution for edge processing in kernel segmentation at present. Thank you for providing a way of thinking, and I will learn about it.

FabianHoerst commented 1 year ago

There are more profound solutions, such as merging the prediction maps. The problem is the huge amount of merging operations needed, resulting in high run-times and memory footprint. For example, we had a slide with 811690 detected cells, with $180,000 \times 156,000$ px slide dimension. For such huge files, our merging strategy is very efficient as the logs suggest:

2023-07-12 09:27:50,887 [INFO] - Detected cells before cleaning: 811690
2023-07-12 09:27:50,887 [INFO] - Initializing Cell-Postprocessor
2023-07-12 09:29:43,674 [INFO] - Finding edge-cells for merging
2023-07-12 09:29:49,899 [INFO] - Removal of cells detected multiple times
2023-07-12 09:30:18,602 [INFO] - Iteration 0: Found overlap of # cells: 48841
2023-07-12 09:30:32,337 [INFO] - Iteration 1: Found overlap of # cells: 720
2023-07-12 09:30:45,568 [INFO] - Iteration 2: Found overlap of # cells: 26
2023-07-12 09:30:58,768 [INFO] - Iteration 3: Found overlap of # cells: 3
2023-07-12 09:31:12,008 [INFO] - Iteration 4: Found overlap of # cells: 0
2023-07-12 09:31:12,008 [INFO] - Found all overlapping cells
2023-07-12 09:31:17,713 [INFO] - Detected cells after cleaning: 697341

For such large files, I do not think that there are many other possibilities not blowing up runtime and memory.

Transformer-man commented 1 year ago

Yes, merging forecast maps will consume a lot of time.

TIO-IKIM / CellViT

There is a problem with cell segmentation when predicting large images #7