NVIDIA / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
672 stars 263 forks source link

Training loss is nan as compared to bvlc caffe #520

Closed twmht closed 6 years ago

twmht commented 6 years ago

Hi,

Environment: nvcaffe 0.17 + cuda9.2 + cudnn7 + Ubuntu 16.04 + Pascal 1080 Ti

I am training face identification and found the training loss became nan it a few iterations. But everything is fine when in bvlc-caffe.

Here is the training prototxt (https://github.com/ydwen/caffe-face/blob/caffe-face/face_example/face_train_test.prototxt)

Here is the training log train.log

any idea?

drnikolaev commented 6 years ago

@twmht could you verify https://github.com/drnikolaev/caffe/tree/caffe-0.17 release candidate?

drnikolaev commented 6 years ago

@twmht Please verify https://github.com/NVIDIA/caffe/tree/v0.17.1 release and reopen the issue if needed.

twmht commented 6 years ago

@drnikolaev

thank you. I have not verified yet.

But what is the cause for this issue? What changes did you made to solve this issue? (https://github.com/NVIDIA/caffe/pull/528)