facebookresearch / deepmask

Torch implementation of DeepMask and SharpMask
Other
3.11k stars 507 forks source link

Getting 'nan' as loss after few iterations, when training with custom dataset #100

Open ghost opened 7 years ago

ghost commented 7 years ago

I am trying to include train deepmask for custom dataset. I am able to make it run. But after few iterations i am getting nan as loss value, as mentioned below:-

**[test]  | epoch 00020 | IoU: mean 058.02 median 074.42 suc@.5 071.37 suc@.7 058.18 | acc 074.93 | bestmo$
[train] | epoch 00021 | s/batch 0.67 | loss: 0.01263
[train] | epoch 00022 | s/batch 0.67 | loss: 0.01085
[test]  | epoch 00022 | IoU: mean 057.31 median 074.70 suc@.5 070.28 suc@.7 057.72 | acc 077.69 | bestmo$
[train] | epoch 00023 | s/batch 0.67 | loss: 0.01131
[train] | epoch 00024 | s/batch 0.67 | loss: 0.01061
[test]  | epoch 00024 | IoU: mean 056.64 median 074.22 suc@.5 068.81 suc@.7 056.52 | acc 073.01 | bestmo$
[train] | epoch 00025 | s/batch 0.66 | loss:     nan
[train] | epoch 00026 | s/batch 0.63 | loss:     nan
[test]  | epoch 00026 | IoU: mean 000.00 median 000.00 suc@.5 000.00 suc@.7 000.00 | acc 048.59 | bestmo$
[train] | epoch 00027 | s/batch 0.63 | loss:     nan
[train] | epoch 00028 | s/batch 0.63 | loss:     nan
[test]  | epoch 00028 | IoU: mean 000.00 median 000.00 suc@.5 000.00 suc@.7 000.00 | acc 050.85 | bestmo$
[train] | epoch 00029 | s/batch 0.63 | loss:     nan
[train] | epoch 00030 | s/batch 0.63 | loss:     nan**

Any clue why is it happening ? (it was working fine with COCO dataset)

AbuBakrCh commented 7 years ago

Were you able to resolve it?

dgriffiths3 commented 6 years ago

I have this problem as well. Does anybody know what is causing it? It runs with the Coco dataset and also I had it running with one custom dataset. Which leads me to assume it must be the Coco format .json file I have created. But I really can't see anything wrong or any differences other than the data.