facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.21k stars 5.45k forks source link

E.terminate called without an active exception when run test_spatial_narrow_as_op.py #633

Open jiangguoding opened 5 years ago

jiangguoding commented 5 years ago

PLEASE FOLLOW THESE INSTRUCTIONS BEFORE POSTING

  1. Please thoroughly read README.md, INSTALL.md, GETTING_STARTED.md, and FAQ.md
  2. Please search existing open and closed issues in case your issue has already been reported
  3. Please try to debug the issue in case you can solve it on your own before posting

After following steps 1-3 above and agreeing to provide the detailed information requested below, you may continue with posting your issue

(Delete this line and the text above it.)

Expected results

https://github.com/facebookresearch/Detectron/blob/master/INSTALL.md python $DETECTRON/detectron/tests/test_spatial_narrow_as_op.py

run OK.

What did you expect to see? RUN OK.

Actual results


world@world-OMEN-X-by-HP-Laptop-17-ap0xx:~/Downloads$ world@world-OMEN-X-by-HP-Laptop-17-ap0xx:~/Downloads$ world@world-OMEN-X-by-HP-Laptop-17-ap0xx:~/Downloads$ python $DETECTRON/detectron/tests/test_spatial_narrow_as_op.py E0824 00:23:30.765760 17320 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. E0824 00:23:30.765779 17320 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. E0824 00:23:30.765781 17320 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. Found Detectron ops lib: /home/world/miniconda3/lib/libcaffe2_detectron_ops_gpu.so E.terminate called without an active exception Aborted at 1535095411 (unix time) try "date -d @1535095411" if you are using GNU date PC: @ 0x7fc33db7b428 gsignal SIGABRT (@0x3e8000043a8) received by PID 17320 (TID 0x7fc33e332700) from PID 17320; stack trace: @ 0x7fc33df21390 (unknown) @ 0x7fc33db7b428 gsignal @ 0x7fc33db7d02a abort @ 0x7fc33578bb39 __gnu_cxx::verbose_terminate_handler() @ 0x7fc33578a1fb cxxabiv1::__terminate() @ 0x7fc33578a234 std::terminate() @ 0x7fc2e40bb430 caffe2::CUDAContext::~CUDAContext() @ 0x7fc2e40e81f2 caffe2::SpatialNarrowAsOp<>::~SpatialNarrowAsOp() @ 0x7fc334062995 caffe2::Workspace::RunOperatorOnce() @ 0x7fc334ff3ff8 _ZZN6caffe26python16addGlobalMethodsERN8pybind116moduleEENKUlRKNS1_5bytesEE26clES6.isra.3103.constprop.3163 @ 0x7fc334ff41a4 _ZZN8pybind1112cpp_function10initializeIZN6caffe26python16addGlobalMethodsERNS_6moduleEEUlRKNS_5bytesEE26_bJS8_EJNS_4nameENS_5scopeENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4FUNESQ @ 0x7fc3350278b0 pybind11::cpp_function::dispatcher() @ 0x5632fdf5b9e4 _PyCFunction_FastCallDict @ 0x5632fdfe8dfc call_function @ 0x5632fe00d94a _PyEval_EvalFrameDefault @ 0x5632fdfe2f8b fast_function @ 0x5632fdfe8ed5 call_function @ 0x5632fe00d94a _PyEval_EvalFrameDefault @ 0x5632fdfe2206 _PyEval_EvalCodeWithName @ 0x5632fdfe31cf fast_function @ 0x5632fdfe8ed5 call_function @ 0x5632fe00e715 _PyEval_EvalFrameDefault @ 0x5632fdfe2f8b fast_function @ 0x5632fdfe8ed5 call_function @ 0x5632fe00d94a _PyEval_EvalFrameDefault @ 0x5632fdfe2206 _PyEval_EvalCodeWithName @ 0x5632fdfe3897 _PyFunction_FastCallDict @ 0x5632fdf5bdaf _PyObject_FastCallDict @ 0x5632fdf60a73 _PyObject_Call_Prepend @ 0x5632fdf5b7ee PyObject_Call @ 0x5632fe00f10b _PyEval_EvalFrameDefault @ 0x5632fdfe2206 _PyEval_EvalCodeWithName Aborted (core dumped) What did you observe instead?

Detailed steps to reproduce

  1. install tensor flow according to instruction--- https://github.com/williamFalcon/tensorflow-gpu-install-ubuntu-16.04
  2. install Detectron accroding to instruction: https://github.com/facebookresearch/Detectron/blob/master/INSTALL.md For caffe2: https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=prebuilt   #conda install -c caffe2 caffe2-cuda9.0-cudnn7

    python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure" output: success

    python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())' output: 1

    For COCOAPI: do exactly according to insturctions.

    For: Detectron : do exactly according to insturctions. failed: python $DETECTRON/detectron/tests/test_spatial_narrow_as_op.py

The command that you ran python $DETECTRON/detectron/tests/test_spatial_narrow_as_op.py

System information

include "driver_types.h"

jiangguoding commented 5 years ago

I reinstall the Detectron for 3 times according to the instruction.
Still can not solve the problem. take me much time indeed.

I try install pytorch/caffe2 from source build. it still crash. sad...

jiangguoding commented 5 years ago

I am sure GPU works OK. I train model using tensor flow with GPU (GTX 1080).

glc12125 commented 5 years ago

I got the same issue. If I ignore this test to try the following:

python2 tools/infer_simple.py \ --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \ --output-dir /tmp/detectron-visualizations \ --image-ext jpg \ --wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \ demo

it detects no masks at all. Note that I did not build caffe2 from source but used the prebuilt binaries in Ubuntu 16.

wiwengweng commented 5 years ago

I got this issue too. But docker version runs well and the test output is 'ok'. You can try too.

also I found something that you should use python 2 instead of python 3.

I will try some other way to find out more info.

I see there're some tests in the tests folder, then run the python test_zero_even_op.py test also get this error. So I try make ops and found that protobuf header is older. So, I think I shoud install caffe2 from source and use corresponding protobuf version.

updated on 3/9: build from source will solve this. Both tests and demo script will work fine then. I hope this helps.

root@bogon:~/Detectron/detectron/tests# python test_spatial_narrow_as_op.py 
E0831 15:43:12.620936 14113 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0831 15:43:12.620975 14113 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0831 15:43:12.620982 14113 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Found Detectron ops lib: /root/pytorch/build/lib/libcaffe2_detectron_ops_gpu.so
...
----------------------------------------------------------------------
Ran 3 tests in 2.396s

OK
rakidedigama commented 5 years ago

@glc12125 were you able to solve this. I am facing the exact same problem. Did you try building from source?

ir413 commented 5 years ago

Hi @jiangguoding - as noted by @wiwengweng, Detectron does not support Python 3 at this time. Please retry with Python 2 and reopen the issue if you encounter the same problem. Sorry for the inconvenience.

jiangguoding commented 5 years ago

@ir413 : I have tried python2 also. the issue still exist. I even try to build from source, it does not help.

ir413 commented 5 years ago

Thanks for confirming @jiangguoding. I'm not sure what the issue is. Could you please update the description above with your latest setup? (python 2, building from source, etc.) Also, I'm not sure why there is a tensorflow installation step in your steps to reproduce?

jiangguoding commented 5 years ago

@ir413 : thanks for your help. I install tensorflow because I just buy one laptop with nivida GTX1080. you can ignore this. I am out now. later I will provide the related info.
Thanks.

jiangguoding commented 5 years ago

@ir413 : I just follow the steps according to :
https://github.com/facebookresearch/Detectron/blob/master/INSTALL.md

do the following to reproduce the issue:

conda create -n env_caffe27 python=2.7 source activate env_caffe27
conda install -c caffe2 caffe2-cuda9.0-cudnn7

To check if Caffe2 build was successful

python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure" --- sucess

To check if Caffe2 GPU build was successful

This must print a number > 0 in order to use Detectron

python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())' ----1

COCOAPI=/path/to/clone/cocoapi

git clone https://github.com/cocodataset/cocoapi.git $COCOAPI cd $COCOAPI/PythonAPI

Install into global site-packages

DETECTRON=/path/to/clone/detectron

git clone https://github.com/facebookresearch/detectron $DETECTRON

pip install -r $DETECTRON/requirements.txt

cd $DETECTRON && make

python2 $DETECTRON/detectron/tests/test_spatial_narrow_as_op.py


(env_caffe27) world@world:~/code2/detectron/detectron$ python2 $DETECTRON/detectron/tests/test_spatial_narrow_as_op.py No handlers could be found for logger "caffe2.python.net_drawer" net_drawer will not run correctly. Please install the correct dependencies. E0918 20:25:56.688345 6439 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. E0918 20:25:56.688357 6439 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. E0918 20:25:56.688361 6439 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. Found Detectron ops lib: /home/world/miniconda3/envs/env_caffe27/lib/libcaffe2_detectron_ops_gpu.so terminate called without an active exception Aborted at 1537327556 (unix time) try "date -d @1537327556" if you are using GNU date PC: @ 0x7f07ff90c428 gsignal SIGABRT (@0x3e800001927) received by PID 6439 (TID 0x7f0800bb2700) from PID 6439; stack trace:

jiangguoding commented 5 years ago

related nvidia version:

  1. download cuda repo: cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
  2. downlaod cudnn lib. cudnn-9.0-linux-x64-v7.2.1.38.tgz
  3. download nccl repo: nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
Dataphz commented 5 years ago

@jiangguoding Did u solved this? I am also encountering this same problem... sad

jiangguoding commented 5 years ago

@jiangguoding Did u solved this? I am also encountering this same problem... sad

Sorry! I don't.

aliceyayunji commented 5 years ago

I am also encountering this same problem... sad+1

ztjackchan commented 5 years ago

I had a similar problem #880 . I got it solved by reinstalling Caffe2 by building from source. There are some dependencies needed to be installed before running setup.py install and Caffe2 installation page does not mention them. I found them in pytorch Github: https://github.com/pytorch/pytorch#install-dependencies Not sure if it was how I did differently to solve this issue.