facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.27k stars 5.46k forks source link

Encountered CUDA error: invalid device function Error from operator: #245

Open anatlin opened 6 years ago

anatlin commented 6 years ago

Expected results

This test to pass.

python2 $DETECTRON/tests/test_spatial_narrow_as_op.py 

Actual results

RuntimeError: [enforce fail at context_gpu.h:171] . Encountered CUDA error: invalid device function Error from operator: 
input: "A" input: "B" input: "C_grad" output: "A_grad" name: "" type: "SpatialNarrowAsGradient" device_option { device_type: 1 cuda_gpu_id: 0 } is_gradient_op: true

Detailed steps to reproduce

python2 $DETECTRON/tests/test_spatial_narrow_as_op.py 

System information

avilash commented 6 years ago

Any updates on the issue ?

manoshape commented 6 years ago

Any updates on the issue ?

xfarxod commented 6 years ago

Any updates on the issue ?

avilash commented 6 years ago

Please build caffe2 from source. Works on an AWS instance when caffe2 is built from source

ljd16 commented 6 years ago

Any updates on the issue ?

ggaaooppeenngg commented 6 years ago

Any updates on the issue ?

arasharchor commented 6 years ago

System information Operating system: Ubuntu16.04 CUDA version: Cuda compilation tools, release 8.0, V8.0.44 cuDNN version: cudnn-8.0-linux-x64-v7 GPU models (for all devices if they are not all the same): Geforce 1060 PYTHONPATH environment variable: Anacodna2.7 caffe2 binary was installed using: conda install -c caffe2 caffe2-cuda8.0-cudnn7

Detectron$ python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure" Success Detectron$ python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())' 1

export PATH=/usr/local/cuda-8.0/bin:$PATH echo $LD_LIBRARY_PATH /usr/local/cuda-8.0/lib64:/home/majid/softwares/cudnn/8.0-7.1/lib64 @rbgirshick I just experienced the same issue

when I run

python2 $DETECTRON/tests/test_spatial_narrow_as_op.py 

I get the following error:

RuntimeError: [enforce fail at context_gpu.h:171] . Encountered CUDA error: invalid device function Error from operator: 
input: "A" input: "B" input: "C_grad" output: "A_grad" name: "" type: "SpatialNarrowAsGradient" device_option { device_type: 1 cuda_gpu_id: 0 } is_gradient_op: true

after running

python2 tools/infer_simple.py     --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml     --output-dir /tmp/detectron-visualizations     --image-ext jpg     --wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl     demo
python2 tools/train_net.py     --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml     OUTPUT_DIR /tmp/detectron-output

I get the following error:

RuntimeError: [enforce fail at context_gpu.h:155] . Encountered CUDA error: invalid device function Error from operator: 
input: "gpu_0/conv1" input: "gpu_0/res_conv1_bn_s" input: "gpu_0/res_conv1_bn_b" output: "gpu_0/conv1" name: "" type: "AffineChannel" device_option { device_type: 1 cuda_gpu_id: 0 }
mfe7 commented 6 years ago

I have also gotten around this error by building Caffe2 from source

arasharchor commented 6 years ago

@mfe7 , I was able to compile caffe2 from source after a lot of desperate try. Basically, the solution was not that complicated. I was using virtualenv and I was also compiling everything locally. When I installed every package including caffe2 with sudo permission in ubuntu. I worked like a charm and I was able to train with my own custom dataset with amazing results. Currently I am trying to compile it in another machine in which I have no sudo permission. If I can manage that, I will try to post an update here. I am preparing a bash file which you can run easily if you have sudo permission.

ljd16 commented 6 years ago

I compiled caffe2 from source without sudo, and error disappeared. https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=compile

rowanz commented 6 years ago

I was encountering a variant of this issue when using the unsupported python 3 fork. However, I found that I didn't have to install caffe2 from source, just install an older version: conda install -c caffe2 caffe2-cuda8.0-cudnn7=0.8.dev=py36_2018.05.14 hope this helps someone 😄

Sqrt5 commented 6 years ago

same error RuntimeError: [enforce fail at context_gpu.h:181] . Encountered CUDA error: invalid device functionError from operator: input: "A" input: "B" output: "C" name: "" type: "SpatialNarrowAs" device_option { device_type: 1 cuda_gpu_id: 0 } but i use python2.7, so install old version and problem solved. conda remove caffe2-cuda8.0-cudnn7 conda install -c caffe2 caffe2-cuda8.0-cudnn7=0.8.dev=py27_2018.05.14 thanks for @rowanz 's reply