facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.25k stars 5.46k forks source link

Inference with Pretrained Models error #148

Open mzahran001 opened 6 years ago

mzahran001 commented 6 years ago

Expected results

image

Actual results

What did you observe instead? image


E0212 10:57:39.824651  1567 init_intrinsics_check.cc:54] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0212 10:57:39.824681  1567 init_intrinsics_check.cc:54] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0212 10:57:39.824687  1567 init_intrinsics_check.cc:54] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
WARNING cnn.py:  40: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py:  57: Loading weights from: /tmp/detectron-download-cache/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
I0212 10:58:21.947926  1567 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 0.000193389 secs
I0212 10:58:21.948194  1567 net_dag.cc:61] Number of parallel execution chains 63 Number of operators = 402
I0212 10:58:21.970666  1567 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 0.000247387 secs
I0212 10:58:21.970948  1567 net_dag.cc:61] Number of parallel execution chains 30 Number of operators = 358
I0212 10:58:21.973315  1567 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 2.0227e-05 secs
I0212 10:58:21.973356  1567 net_dag.cc:61] Number of parallel execution chains 5 Number of operators = 18
INFO infer_simple.py: 111: Processing demo/15673749081_767a7fa63a_k.jpg -> /tmp/detectron-visualizations/15673749081_767a7fa63a_k.jpg.pdf
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
  what():  [enforce fail at context_gpu.h:105] status == CUDNN_STATUS_SUCCESS. 1 vs 0. , Error at: /var/lib/jenkins/workspace/caffe2/core/context_gpu.h:105: CUDNN_STATUS_NOT_INITIALIZED Error from operator:
input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
*** Aborted at 1518433102 (unix time) try "date -d @1518433102" if you are using GNU date ***
PC: @     0x7f11fd7b1428 gsignal
*** SIGABRT (@0x61f) received by PID 1567 (TID 0x7f1170168700) from PID 1567; stack trace: ***
    @     0x7f11fd7b14b0 (unknown)
    @     0x7f11fd7b1428 gsignal
    @     0x7f11fd7b302a abort
    @     0x7f11f772884d __gnu_cxx::__verbose_terminate_handler()
    @     0x7f11f77266b6 (unknown)
    @     0x7f11f7726701 std::terminate()
    @     0x7f11f7751d38 (unknown)
    @     0x7f11fdb4d6ba start_thread
    @     0x7f11fd88341d clone
    @                0x0 (unknown)
Aborted (core dumped)

Detailed steps to reproduce

I am using caffe2 + Detecron docker image but when I am trying to Inference with Pretrained Models I got these errors. Caffe2 latest Docker image using GPU support:

nvidia-docker run -it caffe2ai/caffe2:latest /bin/bash

Build the image:

cd $Detectron/docker
docker build -t detectron:c2-cuda9-cudnn7 .

make new docker

nvidia-docker run -it detectron1 detectron:c2-cuda9-cudnn7 /bin/bash

Install the COCO API

git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make install

build the python modules

cd $Detectron/lib && make && cd ..

Making Inferences

python2 tools/infer_simple.py \
    --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
    --output-dir /tmp/detectron-visualizations \
    --image-ext jpg \
    --wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
    demo

System information

ir413 commented 6 years ago

Hi @moh3th1, I don't quite follow the steps you're taking in order to use the Detectron docker image.

As described in INSTALL.md, to build the Detectron docker image it is sufficient to:

cd $DETECTRON/docker
docker build -t detectron:c2-cuda9-cudnn7 .

You can then run the demo using:

nvidia-docker run --rm -it detectron:c2-cuda9-cudnn7 python2 tools/infer_simple.py \
    --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
    --output-dir /tmp/detectron-visualizations \
    --image-ext jpg \
    --wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
    demo
mzahran001 commented 6 years ago

Hi @ir413 . Thanks for your response. Unfortunately it produce the same error. Is there is a way to force caffe2 use CPU instead of GPU?

ir413 commented 6 years ago

Unfortunately it produce the same error.

Just to be sure that an older cached version is not being used, please try rebuilding the Detectron image with the --no-cache option:

cd $DETECTRON/docker
docker build --no-cache -t detectron:c2-cuda9-cudnn7 .

If the error is still there, please paste the complete output from both building and running the image.

Is there is a way to force caffe2 use CPU instead of GPU?

Detectron currently requires Caffe2 with GPU support.

zhcf commented 6 years ago

same error, not using docker image.

remcova commented 6 years ago

Same error here, does someone have a solution for this?