Training loss is nan as compared to bvlc caffe

NVIDIA / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

672 stars 263 forks source link

Closed twmht closed 6 years ago

twmht commented 6 years ago

Hi,

Environment: nvcaffe 0.17 + cuda9.2 + cudnn7 + Ubuntu 16.04 + Pascal 1080 Ti

I am training face identification and found the training loss became nan it a few iterations. But everything is fine when in bvlc-caffe.

Here is the training log train.log

any idea?

drnikolaev commented 6 years ago

drnikolaev commented 6 years ago

@twmht Please verify https://github.com/NVIDIA/caffe/tree/v0.17.1 release and reopen the issue if needed.

twmht commented 6 years ago

@drnikolaev

thank you. I have not verified yet.

But what is the cause for this issue? What changes did you made to solve this issue? (https://github.com/NVIDIA/caffe/pull/528)