NVIDIA / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
672 stars 263 forks source link

caffe 0.17 l2 norm grows to inf #570

Open JohnnyHan opened 5 years ago

JohnnyHan commented 5 years ago

ubuntu 16.04.4 cuda v8.0.16 gtx 1080 ti

prototxt default_forward_type: FLOAT16 default_backward_type: FLOAT16 default_forward_math: FLOAT16 default_backward_marh: FLOAT16 global_grad_scale: 0.09 global_grad_scale_adaptive:true

solver.prototxt clip_gradients:150

A auto_encoder net, IN BVLC caffe, l2 norm value less than 500, but nvcaffe0.17,the l2 norm grow slowly from 150 to inf, then i get the nan loss.

FLOAT32 format is the same as FLOAT16 if not setting global_grad_scale_adaptive:true and global_grad_scale: 0.09 , l2 norm grows more quickly to inf

drnikolaev commented 5 years ago

@JohnnyHan can you please try

global_grad_scale: 1
global_grad_scale_adaptive: true

also try to remove clip_gradients. If it still breaks please attach complete log here. Thank you.