foolwood / SiamMask

[CVPR2019] Fast Online Object Tracking and Segmentation: A Unifying Approach
http://www.robots.ox.ac.uk/~qwang/SiamMask
MIT License
3.47k stars 816 forks source link

WARNING :root:NaN or Inf in input tensor problem #164

Open wangzhiwei-python opened 4 years ago

wangzhiwei-python commented 4 years ago

INFO:global:Progress: 110 / 83320 [0%], Speed: 1.173 s/iter, ETA 1:03:07 (D:H:M)

WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. [2020-08-02 19:30:50,046-rk0-train_siammask_refine.py#290] Epoch: [1][120/4166] lr: 0.010000 batch_time: 0.435500 (1.143975) data_time: 0.000047 (0.692028) rpn_cls_loss: 0.226122 (0.179825) rpn_loc_loss: 0.189315 (0.190289) rpn_mask_loss: inf (inf) siammask_loss: inf (inf) mask_iou_mean: 0.000000 (0.000000) mask_iou_at_5: 0.000000 (0.000000) mask_iou_at_7: 0.000000 (0.000000) INFO:global:Epoch: [1][120/4166] lr: 0.010000 batch_time: 0.435500 (1.143975)data_time: 0.000047 (0.692028) rpn_cls_loss: 0.226122 (0.179825) rpn_loc_loss: 0.189315 (0.190289) rpn_mask_loss: inf (inf) siammask_loss: inf (inf) mask_iou_mean: 0.000000 (0.000000) mask_iou_at_5: 0.000000 (0.000000) mask_iou_at_7: 0.000000 (0.000000) [2020-08-02 19:30:50,046-rk0-log_helper.py# 97] Progress: 120 / 83320 [0%], Speed: 1.144 s/iter, ETA 1:02:26 (D:H:M)

INFO:global:Progress: 120 / 83320 [0%], Speed: 1.144 s/iter, ETA 1:02:26 (D:H:M)

WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. [2020-08-02 19:31:00,345-rk0-train_siammask_refine.py#290] Epoch: [1][130/4166] lr: 0.010000 batch_time: 0.420345 (1.135102) data_time: 0.000030 (0.685210) rpn_cls_loss: 0.094134 (0.179464) rpn_loc_loss: 0.174147 (0.190233) rpn_mask_loss: inf (inf) siammask_loss: inf (inf) mask_iou_mean: 0.000000 (0.000000) mask_iou_at_5: 0.000000 (0.000000) mask_iou_at_7: 0.000000 (0.000000)

When I train the refine model, the above problem occurs.Then when I adjust lr from 0.01 to 0.001,a warning still appears after training the third epoch.This is my loss. Can anyone help me solve this problem?Thanks! PICTURE:/home/wzw/.config/tencent-qq//AppData/file//sendpix4.jpg

wangzhiwei-python commented 4 years ago

loss

tyj1996 commented 4 years ago

@wangzhiwei-python Hi, I met the same problem, can you tell me how you solved it? thank you.

Divyanshu-nitj commented 3 years ago

i am facing same problem

MacBookYang commented 2 years ago

When I train the refine model, the above problem occurs.Then when I adjust lr from 0.01 to 0.001,a warning still appears after training the third epoch.This is my loss. Can anyone help me solve this problem?Thanks!

Hi, I met the same problem, can you tell me how you solved it? thank you.

MacBookYang commented 2 years ago

Hi, Did you solve this problem?

MacBookYang commented 2 years ago

i am facing same problem

@wangzhiwei-python Hi, I met the same problem, can you tell me how you solved it? thank you.

Hi, Did you solve this problem?

MacBookYang commented 2 years ago

i am facing same problem

Hi, Did you solve this problem?

nanowhiter commented 2 years ago

I got the same problem, maybe. First, I trained a SiamMask_base model and got the best snapshot weight with an accuracy 0.652 and robustness 0.308 on VOT-2016, which is lower than the accuracy mentioned in the paper. But I think that is enough. Then, I used this weight as the pre-trained weight to train a SiamMask_refine model. But I got infinity siammask_loss and rpn_mask_loss at the early epochs. I reduced the learning rate from 0.01 to 0.001 and stopped at 0.000125, and it worked.

xiaofengBian commented 2 years ago

@nanowhiter Hello, what version of torch do you have installed?

nanowhiter commented 2 years ago

@xiaofengBian I use PyTorch 1.5.0.

xiaofengBian commented 2 years ago

@nanowhiter Thank you for your timely reply. My training code still can't run. Could you send your revised training code? Thank you. My email is 945414538@qq.com。Thank you very much。

xiaofengBian commented 2 years ago

@nanowhiter Or add a contact information to discuss it。This problem has been bothering me for a long time。I hope you can help me。