NKI-AI / ahcore-old

Ahcore are the AI for Oncology histopathology core models
Apache License 2.0
0 stars 1 forks source link

The ROIs contain many unannotated regions that get mapped to background leading to suboptimal training. #11

Closed AjeyPaiK closed 1 year ago

AjeyPaiK commented 1 year ago

When I was training my models with tumor-stroma tissue segmentation data, I noticed that the dice score for background class during training looked like the following: image

I assumed that the entire region within the ROI boxes is annotated and thus, none of the pixels within the ROI should be mapped to the "background" class. But upon closer investigation, I found many areas within the ROI boxes that are unannotated (An example slide: TCGA-D8-A1JL-01Z-00-DX1.FE3F0C6B-F98A-4036-BF9A-25A8CC66B1FD). So, quite a lot of tiles within the ROIs are actually shown to the model with the label of "background" which is bad.

Moreover, there seems to be some degree of overlap between the ROI and the regions considered as background in the tile. These may be:

  1. Gaps between two polygons within the annotations which do not get assigned a class during one-hot-encoding image

OR

  1. Some edge artefacts that have happened due to human error while annotating. image

These introduce noisy signals to the model while training. The ROI needs to be corrected appropriately.

AjeyPaiK commented 1 year ago

Above, we decided to ignore_index = 0 in the loss function instead of modifying the region within the ROI.