ResNet101 training error

jwnsu commented 6 years ago

Encountered following error when training ResNet101:

I0415 10:16:34.264998 26574 layer_factory.hpp:77] Creating layer conv1 F0415 10:16:34.265014 26574 layer_factory.cpp:69] Layer conv1 has unknown engine. Check failure stack trace:

Got pretrained ResNet101 model from Kaimin He's OneDrive location: https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777.

Hwang64 commented 6 years ago

@jwnsu , I think you may modify the Makefile.config in $MLKP_ROOT/caffe-mlkp, the "USE_CUDNN" in this file should be equal to 1 or an error may happen just as your comment.

jwnsu commented 6 years ago

@Hwang64 thx for the info. When USE_CUDNN is enabled, there is a compilation error, it compiles fine after turn off USE_CUDNN flag:

CXX src/caffe/util/im2col.cpp
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/util/math_functions.hpp:9,
                 from src/caffe/util/im2col.cpp:4:
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::setConvolutionDesc(cudnnConvolutionStruct**, cudnnTensorDescriptor_t, cudnnFilterDescriptor_t, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:113:70: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’
       pad_h, pad_w, stride_h, stride_w, 1, 1, CUDNN_CROSS_CORRELATION));
                                                                      ^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro ‘CUDNN_CHECK’
     cudnnStatus_t status = condition; \
                            ^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
                 from ./include/caffe/util/device_alternate.hpp:40,
                 from ./include/caffe/common.hpp:19,
                 from ./include/caffe/util/math_functions.hpp:9,
                 from src/caffe/util/im2col.cpp:4:
/usr/local/cuda/include/cudnn.h:537:27: note: declared here
 cudnnStatus_t CUDNNWINAPI cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc,
                           ^
Makefile:563: recipe for target '.build_release/src/caffe/util/im2col.o' failed
make: *** [.build_release/src/caffe/util/im2col.o] Error 1

Same error for both CUDA9/CUDNN7 and CUDA8/CUDNN6. Wondering what is working CUDA/CUDNN version.

The error is resolved by changing ./include/caffe/util/cudnn.hpp to add one more parameter value CUDNN_DATA_FLOAT.

lufei92 commented 6 years ago

@jwnsu hello, i met the same problem as you, i can not understand your method that The error is resolved by changing ./include/caffe/util/cudnn.hpp to add one more parameter value CUDNN_DATA_FLOAT. can you tell me the detail method to solve problem?

Hwang64 / MLKP

ResNet101 training error #3

Encountered following error when training ResNet101: