Open JohnnyHan opened 5 years ago
ubuntu 16.04.4 cuda v8.0.16 gtx 1080 ti
prototxt default_forward_type: FLOAT16 default_backward_type: FLOAT16 default_forward_math: FLOAT16 default_backward_marh: FLOAT16 global_grad_scale: 0.09 global_grad_scale_adaptive:true
solver.prototxt clip_gradients:150
A auto_encoder net, IN BVLC caffe, l2 norm value less than 500, but nvcaffe0.17,the l2 norm grow slowly from 150 to inf, then i get the nan loss.
FLOAT32 format is the same as FLOAT16 if not setting global_grad_scale_adaptive:true and global_grad_scale: 0.09 , l2 norm grows more quickly to inf
@JohnnyHan can you please try
global_grad_scale: 1 global_grad_scale_adaptive: true
also try to remove clip_gradients. If it still breaks please attach complete log here. Thank you.
clip_gradients
ubuntu 16.04.4 cuda v8.0.16 gtx 1080 ti
prototxt default_forward_type: FLOAT16 default_backward_type: FLOAT16 default_forward_math: FLOAT16 default_backward_marh: FLOAT16 global_grad_scale: 0.09 global_grad_scale_adaptive:true
solver.prototxt clip_gradients:150
A auto_encoder net, IN BVLC caffe, l2 norm value less than 500, but nvcaffe0.17,the l2 norm grow slowly from 150 to inf, then i get the nan loss.
FLOAT32 format is the same as FLOAT16 if not setting global_grad_scale_adaptive:true and global_grad_scale: 0.09 , l2 norm grows more quickly to inf