Use MKL blas get error results !

facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.

https://caffe2.ai

Apache License 2.0

8.42k stars 1.95k forks source link

Use MKL blas get error results ! #1030

Open QiYueFeiXue opened 7 years ago

QiYueFeiXue commented 7 years ago

I build caffe2 with mkl blas, but when I run a test net, I get a different result with use nnpack lib or GPU model. And run this test net some times, the all result is different. Is this a bug ? I use this way build: mkdir build cd build cmake .. -DBLAS="MKL" make make install

QiYueFeiXue commented 7 years ago

If I use one thread, the result is right!

Yangqing commented 7 years ago

Ah! We did hit this issue in the past, but I think the most recent MKL fixed the issue. Could you provide a minimal reproducing example (or point us to the test that is creating failures)?

QiYueFeiXue commented 7 years ago

I download the deploy.prototxt and bvlc_alexnet.caffemodel. wget https://raw.githubusercontent.com/BVLC/caffe/master/models/bvlc_alexnet/deploy.prototxt wget http://dl.caffe.berkeleyvision.org/bvlc_alexnet.caffemodel translation： python caffe_translator.py deploy.prototxt bvlc_alexnet.caffemodel --init_net init_net.pb --predict_net predict_net.pb.

alxNet.zip

QiYueFeiXue commented 7 years ago

Thank you ! I want to know how to solve this issue ? If I need update MKL lib or update caffe2 ?

xkszltl commented 6 years ago

With the most recent MKL and caffe2, I get error randomly. Sometimes it just passes, sometimes it get huge error. When the error is not "huge", it's ranging between 5%-11% (rel err) for conv_op_test, reported under contrib/nnpack/conv_op_test.cc:171.

Our test platform is CentOS 7 on Skylake + gcc 5.3.1, Release build with additional "-g".

screen shot 2018-02-09 at 11 08 26