The ROIs contain many unannotated regions that get mapped to background leading to suboptimal training.

When I was training my models with tumor-stroma tissue segmentation data, I noticed that the dice score for background class during training looked like the following:

I assumed that the entire region within the ROI boxes is annotated and thus, none of the pixels within the ROI should be mapped to the "background" class. But upon closer investigation, I found many areas within the ROI boxes that are unannotated (An example slide: TCGA-D8-A1JL-01Z-00-DX1.FE3F0C6B-F98A-4036-BF9A-25A8CC66B1FD). So, quite a lot of tiles within the ROIs are actually shown to the model with the label of "background" which is bad.

Moreover, there seems to be some degree of overlap between the ROI and the regions considered as background in the tile. These may be:

Gaps between two polygons within the annotations which do not get assigned a class during one-hot-encoding

Some edge artefacts that have happened due to human error while annotating.

These introduce noisy signals to the model while training. The ROI needs to be corrected appropriately.

NKI-AI / ahcore-old

The ROIs contain many unannotated regions that get mapped to background leading to suboptimal training. #11