facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Apache License 2.0
8.42k stars 1.94k forks source link

Bus Error #1820

Open speedr972 opened 6 years ago

speedr972 commented 6 years ago

If this is a build issue, please fill out the template below.

System information

CMake summary output

Hi,

I succeded in compiling Caffe2 but when I try to test the from caffe2.python import core command, the code seems to work but I get a Bus error in the end (and the echo "Failure").

Switching to the build directory, here is what I got launching python -m caffe2.python.operator_test.relu_op_test in the build directory :

No handlers could be found for logger "caffe2.python.net_drawer"
net_drawer will not run correctly. Please install the correct dependencies.
E0126 10:14:45.176453 21345 init_intrinsics_check.cc:54] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0126 10:14:45.176465 21345 init_intrinsics_check.cc:54] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0126 10:14:45.176468 21345 init_intrinsics_check.cc:54] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([0.], dtype=float32), gc=, dc=[, device_type: 1], engine=u'')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([0.8655217, 0.8655217, 0.8655217], dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[[ 0.7649818 ],
        [-0.43701455],
        [-0.45846564],
        [-0.64981836],
        [ 0.9077368 ]]], dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[[-0.07841256,  0.8533526 ,  0.48906875]],

       [[-0.07841256, -0.07841256, -0.07841256]],

       [[ 0.5585911 , -0.07841256,  0.7683526 ]],

       [[-0.22895424, -0.0229432 ,  0.17841473]],

       [[ 0.31800967,  0.73580647,  0.34722218]]], dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'CUDNN')
hypothesis_temporary_module_91895562fc4032358d46ddcd0c47c4f5cdaf7a74:34: HypothesisDeprecationWarning: Test took 424.03ms to run. In future the default deadline setting will be 200ms, which will make this an error. You can set deadline to an explicit value of e.g. 500 to turn tests slower than this into an error, or you can set it to None to disable this check entirely.
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[-0.95229864,  0.3907948 ,  0.3907948 ,  0.3907948 ,  0.5278425 ],
       [ 0.3907948 ,  0.3907948 , -0.80610937,  0.3907948 ,  0.3907948 ],
       [ 0.3907948 ,  0.3907948 ,  0.84954214,  0.3907948 ,  0.3907948 ],
       [ 0.3907948 ,  0.3907948 ,  0.0143156 , -0.25598967,  0.3907948 ],
       [ 0.23580822, -0.11414266,  0.3907948 ,  0.3907948 , -0.32927272]],
      dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'CUDNN')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[-0.5526754 ,  0.52237266, -0.5526754 ,  0.7496552 ,  0.8230769 ]],
      dtype=float32), gc=, dc=[, device_type: 1], engine=u'')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[0.12539667, 0.07001037, 0.84666675, 0.83203095],
       [0.84666675, 0.84666675, 0.7643324 , 0.8127611 ],
       [0.84666675, 0.69176364, 0.84666675, 0.84666675]], dtype=float32), gc=, dc=[, device_type: 1], engine=u'')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[[[ 0.99737483, -0.93191975, -0.33808005],
         [ 0.8707983 ,  0.99737483,  0.99737483],
         [-0.2023866 ,  0.94421494, -0.7209695 ]],

        [[ 0.7860019 ,  0.99737483,  0.24964984],
         [-0.52777505,  0.99737483,  0.4310433 ],
         [-0.21319899, -0.7827588 ,  0.9860167 ]],

        [[ 0.99737483, -0.8478974 ,  0.99737483],
         [ 0.77335435,  0.99737483,  0.8982265 ],
         [ 0.99737483,  0.6658948 , -0.36908847]],

        [[-0.4285811 ,  0.09675463,  0.99737483],
         [ 0.42244625,  0.46827143, -0.34393248],
         [ 0.99737483,  0.08018089,  0.99737483]],

        [[ 0.42482585, -0.03561567, -0.53194344],
         [ 0.99737483,  0.99737483,  0.99737483],
         [-0.04453746, -0.880498  , -0.6789654 ]]]], dtype=float32), gc=, dc=[, device_type: 1], engine=u'')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[[[-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434,  0.7852756 , -0.97761434],
         [ 0.7695944 , -0.97761434, -0.39234367]],

        [[-0.97761434, -0.97761434, -0.97761434],
         [ 0.6511973 , -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434]],

        [[-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.24015783, -0.97761434]],

        [[-0.97761434, -0.97761434,  0.99579495],
         [-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434]]],

       [[[-0.79211605, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434]],

        [[-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434],
         [ 0.35069838, -0.97761434, -0.97761434]],

        [[-0.2053604 , -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434]],

        [[-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.84992754],
         [-0.97761434, -0.97761434, -0.97761434]]],

       [[[-0.97761434, -0.97761434,  0.78898394],
         [-0.97761434, -0.97761434, -0.97761434],
         [ 0.07961506, -0.97761434, -0.97761434]],

        [[-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434]],

        [[-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434]],

        [[-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434],
         [-0.97761434, -0.97761434, -0.97761434]]]], dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([-0.51280975, -0.51280975], dtype=float32), gc=device_type: 1, dc=[, device_type: 1], engine=u'')
.
----------------------------------------------------------------------
Ran 1 test in 0.670s

OK
*** Aborted at 1516958085 (unix time) try "date -d @1516958085" if you are using GNU date ***
PC: @     0x7f8fb956845b (unknown)
*** SIGBUS (@0x0) received by PID 21345 (TID 0x7f8fba78d700) from PID 0; stack trace: ***
    @     0x7f8fb9fd5390 (unknown)
    @     0x7f8fb956845b (unknown)
    @     0x7f8fb956bcde (unknown)
    @     0x7f8fb956edca __libc_calloc
    @     0x7f8fabb5f23a (unknown)
    @     0x7f8fabb60f67 (unknown)
    @     0x7f8fabb623d2 (unknown)
    @     0x7f8fabb66426 (unknown)
    @     0x7f8fabb66842 (unknown)
    @     0x7f8fabb5942b (unknown)
    @     0x7f8fabb59b75 (unknown)
    @     0x7f8fb9523ff8 (unknown)
    @     0x7f8fb9524045 exit
    @     0x7f8fba2dfe61 Py_Exit
    @     0x7f8fba2dff73 handle_system_exit.part.2
    @     0x7f8fba2e01e5 PyErr_PrintEx
    @     0x7f8fba2f2146 RunModule
    @     0x7f8fba2f26ee Py_Main
    @     0x7f8fb950a830 __libc_start_main
    @     0x55b15944687f (unknown)

Apparently, the test has succeded but I still got this SIGBUS. Do you know where it could come from and how to resolve it ?

Thanks

QinZibo commented 6 years ago

sudo pip install \ flask \ graphviz \ hypothesis \ jupyter \ matplotlib \ pydot python-nvd3 \ pyyaml \ requests \ scikit-image \ scipy \ setuptools \ tornado

pietern commented 6 years ago

Can you post the CMake summary for completeness?

The sigbus happens in the Python exit routine...

speedr972 commented 6 years ago

Is that what you are talking about ?

-- Does not need to define long separately.
-- Current compiler supports avx2 extention. Will build perfkernels.
-- The BLAS backend of choice:Eigen
-- Could NOT find NNPACK (missing:  CPUINFO_LIBRARY) 
-- Brace yourself, we are building NNPACK
-- Found PythonInterp: /home/poulpe/anaconda2/envs/caffe2-env/bin/python (found version "2.7.14") 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/six-download
[100%] Built target six
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/enum-download
[100%] Built target enum
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/opcodes-download
[100%] Built target opcodes
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/peachpy-download
[ 11%] Performing update step for 'peachpy'
Current branch master is up to date.
[ 22%] No configure step for 'peachpy'
[ 33%] No build step for 'peachpy'
[ 44%] No install step for 'peachpy'
[ 55%] No test step for 'peachpy'
[ 66%] Completed 'peachpy'
[100%] Built target peachpy
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/fp16-download
[ 11%] Performing update step for 'fp16'
Current branch master is up to date.
[ 22%] No configure step for 'fp16'
[ 33%] No build step for 'fp16'
[ 44%] No install step for 'fp16'
[ 55%] No test step for 'fp16'
[ 66%] Completed 'fp16'
[100%] Built target fp16
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/fxdiv-download
[ 11%] Performing update step for 'fxdiv'
Current branch master is up to date.
[ 22%] No configure step for 'fxdiv'
[ 33%] No build step for 'fxdiv'
[ 44%] No install step for 'fxdiv'
[ 55%] No test step for 'fxdiv'
[ 66%] Completed 'fxdiv'
[100%] Built target fxdiv
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/psimd-download
[ 11%] Performing update step for 'psimd'
Current branch master is up to date.
[ 22%] No configure step for 'psimd'
[ 33%] No build step for 'psimd'
[ 44%] No install step for 'psimd'
[ 55%] No test step for 'psimd'
[ 66%] Completed 'psimd'
[100%] Built target psimd
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/pthreadpool-download
[ 11%] Performing update step for 'pthreadpool'
Current branch master is up to date.
[ 22%] No configure step for 'pthreadpool'
[ 33%] No build step for 'pthreadpool'
[ 44%] No install step for 'pthreadpool'
[ 55%] No test step for 'pthreadpool'
[ 66%] Completed 'pthreadpool'
[100%] Built target pthreadpool
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/googletest-download
[100%] Built target googletest
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/fxdiv-download
[ 11%] Performing update step for 'fxdiv'
Current branch master is up to date.
[ 22%] No configure step for 'fxdiv'
[ 33%] No build step for 'fxdiv'
[ 44%] No install step for 'fxdiv'
[ 55%] No test step for 'fxdiv'
[ 66%] Completed 'fxdiv'
[100%] Built target fxdiv
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/googletest-download
[100%] Built target googletest
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/googlebenchmark-download
[100%] Built target googlebenchmark
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/googletest-download
[100%] Built target googletest
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/googlebenchmark-download
[100%] Built target googlebenchmark
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/googletest-download
[100%] Built target googletest
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build/googlebenchmark-download
[100%] Built target googlebenchmark
-- Found gflags  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so)
-- Found glog    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so)
-- git Version: v0.0.0
-- Version: 0.0.0
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX -- success
-- Performing Test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Performing Test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX -- success
-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK -- success
-- Found lmdb    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/liblmdb.so)
-- Found LevelDB (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libleveldb.so)
-- Found Snappy  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libsnappy.so)
-- Could NOT find RocksDB (missing:  RocksDB_INCLUDE_DIR RocksDB_LIBRARIES) 
CMake Warning at cmake/Dependencies.cmake:177 (message):
  Not compiling with RocksDB.  Suppress this warning with -DUSE_ROCKSDB=OFF
Call Stack (most recent call first):
  CMakeLists.txt:81 (include)

-- Found CUDA: /usr/local/cuda-8.0 (found suitable exact version "8.0") 
-- OpenCV found (/usr/local/share/OpenCV)
-- Found system Eigen at /usr/include/eigen3
-- Found PythonInterp: /home/poulpe/anaconda2/envs/caffe2-env/bin/python (found suitable version "2.7.14", minimum required is "2.7") 
-- NumPy ver. 1.14.0 found (include: /home/poulpe/anaconda2/envs/caffe2-env/lib/python2.7/site-packages/numpy/core/include)
-- Could NOT find pybind11 (missing:  pybind11_INCLUDE_DIR) 
-- MPI support found
-- MPI compile flags: 
-- MPI include path: /usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include/usr/lib/openmpi/include/usr/lib/openmpi/include/openmpi
-- MPI LINK flags path:  -Wl,-rpath  -Wl,/usr/lib/openmpi/lib  -Wl,--enable-new-dtags
-- MPI libraries: /usr/lib/openmpi/lib/libmpi_cxx.so/usr/lib/openmpi/lib/libmpi.so
CMake Warning at cmake/Dependencies.cmake:295 (message):
  OpenMPI found, but it is not built with CUDA support.
Call Stack (most recent call first):
  CMakeLists.txt:81 (include)

-- CUDA detected: 8.0
-- Added CUDA NVCC flags for: sm_20 sm_21 sm_30 sm_35 sm_50 sm_52 sm_60 sm_61
-- Found libcuda: /usr/local/cuda-8.0/lib64/stubs/libcuda.so
-- Found libnvrtc: /usr/local/cuda-8.0/lib64/libnvrtc.so
-- Found cuDNN: v7.0.2  (include: /usr/local/cuda-8.0/include, library: /usr/local/cuda-8.0/lib64/libcudnn.so)
-- Could NOT find NCCL (missing:  NCCL_INCLUDE_DIRS NCCL_LIBRARIES) 
-- NCCL: /home/poulpe/Libraries/caffe2/third_party/nccl/build/lib/libnccl_static.a
-- Could NOT find CUB (missing:  CUB_INCLUDE_DIR) 
-- Could NOT find Gloo (missing:  Gloo_INCLUDE_DIR Gloo_LIBRARY) 
-- MPI include path: /usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include/usr/lib/openmpi/include/usr/lib/openmpi/include/openmpi
-- MPI libraries: /usr/lib/openmpi/lib/libmpi_cxx.so/usr/lib/openmpi/lib/libmpi.so
-- Found CUDA: /usr/local/cuda-8.0 (found suitable version "8.0", minimum required is "7.0") 
-- CUDA detected: 8.0
-- Found libcuda: /usr/local/cuda-8.0/lib64/stubs/libcuda.so
-- Found libnvrtc: /usr/local/cuda-8.0/lib64/libnvrtc.so
CMake Warning at cmake/Dependencies.cmake:473 (message):
  Metal is only used in ios builds.
Call Stack (most recent call first):
  CMakeLists.txt:81 (include)

-- GCC 5.4.0: Adding gcc and gcc_s libs to link line
-- Include NCCL operators
-- Including image processing operators
-- Excluding video processing operators due to no opencv
-- Excluding mkl operators as we are not using mkl
-- Include Observer library
-- Automatically generating missing __init__.py files.
-- 
-- ******** Summary ********
-- General:
--   CMake version         : 3.5.1
--   CMake command         : /usr/bin/cmake
--   Git version           : v0.8.1-991-g4358930
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler version  : 5.4.0
--   Protobuf compiler     : /usr/bin/protoc
--   Protobuf include path : /usr/include
--   Protobuf libraries    : optimized;/usr/lib/x86_64-linux-gnu/libprotobuf.so;debug;/usr/lib/x86_64-linux-gnu/libprotobuf.so;-pthread
--   CXX flags             :  -std=c++11 -O2 -fPIC -Wno-narrowing -Wno-invalid-partial-specialization
--   Build type            : Release
--   Compile definitions   : 
-- 
--   BUILD_BINARY          : ON
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : ON
--     Python version      : 2.7.14
--     Python library      : /home/poulpe/anaconda2/envs/caffe2-env/lib/libpython2.7.so
--   BUILD_SHARED_LIBS     : ON
--   BUILD_TEST            : ON
--   USE_ATEN              : OFF
--   USE_ASAN              : OFF
--   USE_CUDA              : ON
--     CUDA version        : 8.0
--     CuDNN version       : 7.0.2
--   USE_EIGEN_FOR_BLAS    : 1
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : ON
--   USE_GLOG              : ON
--   USE_GLOO              : ON
--   USE_LEVELDB           : ON
--     LevelDB version     : 1.18
--     Snappy version      : 1.1.3
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : ON
--     LMDB version        : 0.9.18
--   USE_METAL             : OFF
--   USE_MKL               : 
--   USE_MOBILE_OPENGL     : OFF
--   USE_MPI               : ON
--   USE_NCCL              : ON
--   USE_NERVANA_GPU       : OFF
--   USE_NNPACK            : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCV            : ON
--     OpenCV version      : 2.4.13.3
--   USE_OPENMP            : OFF
--   USE_PROF              : OFF
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_THREADS           : ON
--   USE_ZMQ               : OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /home/poulpe/Libraries/caffe2/build
pjh5 commented 6 years ago

I don't see anything that looks off in this cmake output. Can you try running python -m caffe2.python.operator_test.relu_op_test outside of the build directory?

Have you tried running anything else in Caffe2? Does this always happen to you when call into Caffe2 from a python interpreter? Does your python interpreter work normally?