IntelLabs / SkimCaffe

Caffe for Sparse Convolutional Neural Network
Other
238 stars 64 forks source link

Illegal instruction error #19

Closed alspace closed 5 years ago

alspace commented 5 years ago

Hi, Thank you for the nice work! But I encountered an issue; I cannot run the example.

When I run the example with:

build/tools/caffe.bin test -model models/bvlc_reference_caffenet/test_direct_sconv_mkl.prototxt -weights models/bvlc_reference_caffenet/logs/acc_57.5_0.001_5e-5_ft_0.001_5e-5/0.001_5e-05_0_1_0_0_0_0_Sun_Jan__8_07-35-54_PST_2017/caffenet_train_iter_640000.caffemodel

I got this error:

... I1025 20:16:03.814445 20523 net.cpp:901] Ignoring source layer relu5 I1025 20:16:03.814461 20523 net.cpp:901] Ignoring source layer pool5 I1025 20:16:03.845489 20523 inner_product_relu_dropout_layer.cpp:57] layer fc6 has sparsity of 0.902683 transpose 0 *** Aborted at 1540466166 (unix time) try "date -d @1540466166" if you are using GNU date *** PC: @ 0x7fc7562de910 libxsmm_spmdm_init *** SIGILL (@0x7fc7562de910) received by PID 20523 (TID 0x7fc7568b1e80) from PID 1445849360; stack trace: *** @ 0x7fc7521a8330 (unknown) @ 0x7fc7562de910 libxsmm_spmdm_init @ 0x7fc756129f6b caffe::InnerProductReLUDropoutLayer<>::WeightAlign() @ 0x7fc756156f71 caffe::Net<>::CopyTrainedLayersFrom() @ 0x7fc756156ac4 caffe::Net<>::CopyTrainedLayersFrom() @ 0x40dbc4 test() @ 0x408a03 main @ 0x7fc751df4f45 (unknown) @ 0x408829 (unknown) @ 0x0 (unknown) ./test.sh: line 4: 20523 Illegal instruction (core dumped) build/tools/caffe.bin test -model models/bvlc_reference_caffenet/test_direct_sconv_mkl.prototxt -weights models/bvlc_reference_caffenet/logs/acc_57.5_0.001_5e-5_ft_0.001_5e-5/0.001_5e-05_0_1_0_0_0_0_Sun_Jan__8_07-35-54_PST_2017/caffenet_train_iter_640000.caffemodel

My environment is as follow: OS: Ubintu 14.04 CPU: Intel Xeon E5-2630 (12 cores) icc version: 18.0.5 boost version : 1.59.0

I modified Makefile.config in order to refer to boost path:

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include src src/libxsmm/include $(CUDA_DIR)/include /home/alspace/Kit/boost/include LIBRARY_DIRS := $(PYTHON_LIB) /usr/lib src/SpMP src/libxsmm/lib $(CUDA_DIR)/lib64 /home/alspace/Kit/boost/lib

I have no error for 'make all' and 'make test', but 'make runtest' shows following error: (Note: building the original caffe does not show any error)

[----------] 8 tests from LRNLayerTest/0, where TypeParam = caffe::CPUDevice<float> [ RUN ] LRNLayerTest/0.TestSetupAcrossChannels F1025 20:29:34.908768 21808 lrn_layer.cpp:101] channels_*height_*width_ should be a multiple of 8 *** Check failure stack trace: *** @ 0x2b43279b2daa (unknown) @ 0x2b43279b2ce4 (unknown) @ 0x2b43279b26e6 (unknown) @ 0x2b43279b5687 (unknown) @ 0x2b432af2fae6 caffe::LRNLayer<>::Reshape() @ 0x89aa61 caffe::LRNLayerTest_TestSetupAcrossChannels_Test<>::TestBody() @ 0xee2a7e testing::Test::Run() @ 0xee2630 testing::TestInfo::Run() @ 0xee2302 testing::TestCase::Run() @ 0xee173b testing::internal::UnitTestImpl::RunAllTests() @ 0xede946 testing::UnitTest::Run() @ 0x458edd main @ 0x2b432bf18f45 (unknown) @ 0x458da9 (unknown) @ (nil) (unknown) Aborted (core dumped) make: *** [runtest] Error 134

alspace commented 5 years ago

Oh, it was due to AVX2. My cpu does not support AVX2 but only AVX.

I build libxsmm with AVX and now it works! Thanks.