Tramac / awesome-semantic-segmentation-pytorch

Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)
Apache License 2.0
2.85k stars 582 forks source link

save_checkpoint may cause fault when not skip_val #67

Open liubo0902 opened 5 years ago

liubo0902 commented 5 years ago

There is a tiny bug in the function validation() within train.py. save_checkpoint() should just be implemented when save_to_disk is True.

bijjuair commented 4 years ago

True. This is a bug when using distributed computing. Due to simultaneous writes, the checkpoint file is getting corrupted. The fix is as you suggested which saves the checkpoint only for rank = 0.