Investigate the performance gap between baseline MONAI and Clara Train

drbeh commented 3 years ago

Investigate the performance gap for digital pathology metastasis detection pipeline between baseline in MONAI (numpy transforms and torchvision for color jitter) and Clara Train (which has the same transforms as MONAI).

drbeh commented 3 years ago

I went through every detail of Clara and MONAI source codes and configurations to find any discrepancy. After spending tens of hours on this, the only difference that I found was MONAI version of pipeline was using TorchVisionFCModel while Clara Train version was using TorchVisionFullyConvModel (which is deprecated). However, I checked the implementation and both create exactly the same model. Although I have found some incidental finding about reproducibility in MONAI, I was not able to find any difference between the components being used or any of their arguments.

drbeh commented 3 years ago

To further confirm that there are not a performance gap and it might be the artifact of reporting a value instead of confidence interval, I ran 6 training (3 with Clara Train, and 3 with MONAI) with difference random seeds (0, 112, and 1123) and calculated their FROC. The results showed that there is no statistical difference between the two version (please see the comparison plot below).

Digital Pathology Metastasis Detection

The only thing that might be affecting the previously reported difference is setting

torch.backends.cudnn.benchmark = True

for performance, which makes is non-deterministic. However, even using deterministic flags seems not to make the pipeline deterministic although they are closer.

monai.utils.set_determinism(seed=0)

Project-MONAI / MONAI

Investigate the performance gap between baseline MONAI and Clara Train #3268