Closed drbeh closed 3 years ago
I went through every detail of Clara and MONAI source codes and configurations to find any discrepancy. After spending tens of hours on this, the only difference that I found was MONAI version of pipeline was using TorchVisionFCModel
while Clara Train version was using TorchVisionFullyConvModel
(which is deprecated). However, I checked the implementation and both create exactly the same model. Although I have found some incidental finding about reproducibility in MONAI, I was not able to find any difference between the components being used or any of their arguments.
To further confirm that there are not a performance gap and it might be the artifact of reporting a value instead of confidence interval, I ran 6 training (3 with Clara Train, and 3 with MONAI) with difference random seeds (0, 112, and 1123) and calculated their FROC. The results showed that there is no statistical difference between the two version (please see the comparison plot below).
The only thing that might be affecting the previously reported difference is setting
torch.backends.cudnn.benchmark = True
for performance, which makes is non-deterministic. However, even using deterministic flags seems not to make the pipeline deterministic although they are closer.
monai.utils.set_determinism(seed=0)
Investigate the performance gap for digital pathology metastasis detection pipeline between baseline in MONAI (numpy transforms and torchvision for color jitter) and Clara Train (which has the same transforms as MONAI).