facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.23k stars 5.45k forks source link

Support Jetson TX1? #88

Open Spoon94 opened 6 years ago

Spoon94 commented 6 years ago

I have install caffe2 successfully as well as COCOAPI . When run python2 $DETECTRON/tests/test_spatial_narrow_as_op.py, Error occurs like follow:


ERROR: test_large_forward (main.SpatialNarrowAsOpTest)

Traceback (most recent call last): File "./tests/test_spatial_narrow_as_op.py", line 68, in test_large_forward self._run_test(A, B) File "./tests/test_spatial_narrow_as_op.py", line 39, in _run_test workspace.RunOperatorOnce(op) File "/usr/local/caffe2/python/workspace.py", line 176, in RunOperatorOnce return C.run_operator_once(StringifyProto(operator)) RuntimeError: [enforce fail at context_gpu.h:170] . Encountered CUDA error: invalid device function Error from operator: input: "A" input: "B" output: "C" name: "" type: "SpatialNarrowAs" device_option { device_type: 1 cuda_gpu_id: 0 }

====================================================================== ERROR: test_small_forward_and_gradient (main.SpatialNarrowAsOpTest)

Traceback (most recent call last): File "./tests/test_spatial_narrow_as_op.py", line 59, in test_small_forward_and_gradient self._run_test(A, B, check_grad=True) File "./tests/test_spatial_narrow_as_op.py", line 39, in _run_test workspace.RunOperatorOnce(op) File "/usr/local/caffe2/python/workspace.py", line 176, in RunOperatorOnce return C.run_operator_once(StringifyProto(operator)) RuntimeError: [enforce fail at context_gpu.h:170] . Encountered CUDA error: invalid device function Error from operator: input: "A" input: "B" output: "C" name: "" type: "SpatialNarrowAs" device_option { device_type: 1 cuda_gpu_id: 0 }


Ran 3 tests in 1.526s

FAILED (errors=2)


I have no idea about this.

ZzzjzzZ commented 6 years ago

I test it on Jetson TX2, there are no errors.

ir413 commented 6 years ago

Hi @Spoon94, we haven't tested Detectron on TX1 ourselves but given that Caffe2 supports TX1 you should be able to run Detectron on TX1. It seems that @CarryJzzZ managed to run it on TX2 so maybe he can share potential additional tips with you.

Two things to double-check:

You installed Caffe2 following the installation instructions for tegra?

You confirmed that Caffe2 GPU build was successful?

# This must print a number > 0 in order to use Detectron
python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'
Spoon94 commented 6 years ago

@ir413 The following output is 1. But when I run test of caffe2 errors occurs.

# This must print a number > 0 in order to use Detectron
python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'
Spoon94 commented 6 years ago

@CarryJzzZ You use cmake or shell script?

cishwarya commented 6 years ago

Hello @Spoon94 did you solve the issue? @ir413 I am getting the same error as @Spoon94. output of python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices()) is 4 I have installed caffe2 using conda `

  conda install -c caffe2 caffe2-cuda8.0-cudnn7

Below is the output of nvidia-smi. I am using detectron in that case CUDA for the first time Can you please help here? Fri Mar 2 14:51:07 2018
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.111 Driver Version: 384.111 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN Xp Off | 00000000:05:00.0 On | N/A | | 23% 37C P8 11W / 250W | 695MiB / 12181MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN Xp Off | 00000000:06:00.0 Off | N/A | | 23% 40C P8 11W / 250W | 2MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 TITAN Xp Off | 00000000:09:00.0 Off | N/A | | 23% 41C P8 9W / 250W | 2MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 TITAN Xp Off | 00000000:0A:00.0 Off | N/A | | 23% 34C P8 10W / 250W | 2MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1263 G /usr/lib/xorg/Xorg 433MiB | | 0 2383 G compiz 186MiB | | 0 16658 G ...-token=1299DE8AA85A5380C75C435BF9B1C466 49MiB | | 0 20369 G /usr/lib/firefox/firefox 23MiB | +-----------------------------------------------------------------------------+ `

mfe7 commented 6 years ago

I have also seen this error on a desktop machine with Ubuntu 16.04 and an NVIDIA 1060 card.

(detectron) mfe@mfe-ubuntu:~/code/detectron$ python2 tests/test_spatial_narrow_as_op.py
No handlers could be found for logger "caffe2.python.net_drawer"
net_drawer will not run correctly. Please install the correct dependencies.
E0406 12:55:08.916733  3932 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0406 12:55:08.916744  3932 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0406 12:55:08.916746  3932 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Found Detectron ops lib: /home/mfe/anaconda3/envs/detectron/lib/libcaffe2_detectron_ops_gpu.so
E.E
======================================================================
ERROR: test_large_forward (__main__.SpatialNarrowAsOpTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_spatial_narrow_as_op.py", line 68, in test_large_forward
    self._run_test(A, B)
  File "tests/test_spatial_narrow_as_op.py", line 39, in _run_test
    workspace.RunOperatorOnce(op)
  File "/home/mfe/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/workspace.py", line 165, in RunOperatorOnce
    return C.run_operator_once(StringifyProto(operator))
RuntimeError: [enforce fail at context_gpu.h:155] . Encountered CUDA error: invalid device function Error from operator: 
input: "A" input: "B" output: "C" name: "" type: "SpatialNarrowAs" device_option { device_type: 1 cuda_gpu_id: 0 }

======================================================================
ERROR: test_small_forward_and_gradient (__main__.SpatialNarrowAsOpTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_spatial_narrow_as_op.py", line 59, in test_small_forward_and_gradient
    self._run_test(A, B, check_grad=True)
  File "tests/test_spatial_narrow_as_op.py", line 39, in _run_test
    workspace.RunOperatorOnce(op)
  File "/home/mfe/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/workspace.py", line 165, in RunOperatorOnce
    return C.run_operator_once(StringifyProto(operator))
RuntimeError: [enforce fail at context_gpu.h:155] . Encountered CUDA error: invalid device function Error from operator: 
input: "A" input: "B" output: "C" name: "" type: "SpatialNarrowAs" device_option { device_type: 1 cuda_gpu_id: 0 }

----------------------------------------------------------------------
Ran 3 tests in 0.354s

FAILED (errors=2)

I can confirm that caffe2 was installed correctly and can see my gpu:

(detectron) mfe@mfe-ubuntu:~/code/detectron$ python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
Success
(detectron) mfe@mfe-ubuntu:~/code/detectron$ python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'
1

and:

(detectron) mfe@mfe-ubuntu:~/code/detectron$ nvidia-smi
Fri Apr  6 12:58:53 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   37C    P8     9W / 156W |    792MiB /  6069MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1078      G   /usr/lib/xorg/Xorg                           412MiB |
|    0      1742      G   /opt/teamviewer/tv_bin/TeamViewer              1MiB |
|    0      1951      G   compiz                                       108MiB |
|    0      2299      G   ...-token=098A14C533A842315853BF4DEAA8A6E9   136MiB |
|    0     25321      G   ...-token=A9684E55600A374D0EC8C5B0E9B4F86E    88MiB |
|    0     25420      G   ...-token=1276F752EEF983320812E87494AEEE42    42MiB |
+-----------------------------------------------------------------------------+

The output of make is:

(detectron) mfe@mfe-ubuntu:~/code/detectron/lib$ make
python2 setup.py develop --user
running develop
running egg_info
writing Detectron.egg-info/PKG-INFO
writing top-level names to Detectron.egg-info/top_level.txt
writing dependency_links to Detectron.egg-info/dependency_links.txt
reading manifest file 'Detectron.egg-info/SOURCES.txt'
writing manifest file 'Detectron.egg-info/SOURCES.txt'
running build_ext
building 'utils.cython_bbox' extension
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/utils
gcc -pthread -B /home/mfe/anaconda3/envs/detectron/compiler_compat -Wl,--sysroot=/ -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/mfe/anaconda3/envs/detectron/lib/python2.7/site-packages/numpy/core/include -I/home/mfe/anaconda3/envs/detectron/include/python2.7 -c utils/cython_bbox.c -o build/temp.linux-x86_64-2.7/utils/cython_bbox.o -Wno-cpp
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/utils
gcc -pthread -shared -B /home/mfe/anaconda3/envs/detectron/compiler_compat -L/home/mfe/anaconda3/envs/detectron/lib -Wl,-rpath=/home/mfe/anaconda3/envs/detectron/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-2.7/utils/cython_bbox.o -L/home/mfe/anaconda3/envs/detectron/lib -lpython2.7 -o build/lib.linux-x86_64-2.7/utils/cython_bbox.so
building 'utils.cython_nms' extension
gcc -pthread -B /home/mfe/anaconda3/envs/detectron/compiler_compat -Wl,--sysroot=/ -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/mfe/anaconda3/envs/detectron/lib/python2.7/site-packages/numpy/core/include -I/home/mfe/anaconda3/envs/detectron/include/python2.7 -c utils/cython_nms.c -o build/temp.linux-x86_64-2.7/utils/cython_nms.o -Wno-cpp
gcc -pthread -shared -B /home/mfe/anaconda3/envs/detectron/compiler_compat -L/home/mfe/anaconda3/envs/detectron/lib -Wl,-rpath=/home/mfe/anaconda3/envs/detectron/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-2.7/utils/cython_nms.o -L/home/mfe/anaconda3/envs/detectron/lib -lpython2.7 -o build/lib.linux-x86_64-2.7/utils/cython_nms.so
copying build/lib.linux-x86_64-2.7/utils/cython_bbox.so -> utils
copying build/lib.linux-x86_64-2.7/utils/cython_nms.so -> utils
Creating /home/mfe/.local/lib/python2.7/site-packages/Detectron.egg-link (link to .)
Detectron 0.0.0 is already the active version in easy-install.pth

Installed /home/mfe/code/detectron/lib
Processing dependencies for Detectron==0.0.0
Finished processing dependencies for Detectron==0.0.0

Any ideas what's going on?

YefeiGao commented 6 years ago

I have also encountered this problem, it seems Caffe2 and Detectron are installed correctly with the output like: (detectron) [gaoyefei@dlgpu1 ~/project/Detectron-master]$ python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())' 4 (detectron) [gaoyefei@dlgpu1 ~/project/Detectron-master/lib]$ make python2 setup.py develop --user Compiling utils/cython_bbox.pyx because it changed. Compiling utils/cython_nms.pyx because it changed. [1/2] Cythonizing utils/cython_bbox.pyx [2/2] Cythonizing utils/cython_nms.pyx ... Installed /home/gaoyefei/project/Detectron-master/lib Processing dependencies for Detectron==0.0.0 Finished processing dependencies for Detectron==0.0.0

However, when run: (detectron) [gaoyefei@dlgpu1 ~/project/Detectron-master]$ python tests/test_spatial_narrow_as_op.py the error occurs... ERROR: test_small_forward_and_gradient (main.SpatialNarrowAsOpTest)

Traceback (most recent call last): File "tests/test_spatial_narrow_as_op.py", line 59, in test_small_forward_and_gradient self._run_test(A, B, check_grad=True) File "tests/test_spatial_narrow_as_op.py", line 39, in _run_test workspace.RunOperatorOnce(op) File "/home/gaoyefei/miniconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/workspace.py", line 165, in RunOperatorOnce return C.run_operator_once(StringifyProto(operator)) RuntimeError: [enforce fail at context_gpu.h:155] . Encountered CUDA error: invalid device function Error from operator: input: "A" input: "B" output: "C" name: "" type: "SpatialNarrowAs" device_option { device_type: 1 cuda_gpu_id: 2 }

Ran 3 tests in 2.006s FAILED (errors=2)

i am using conda and cuda8+cudnn7......sad day...

ljd16 commented 6 years ago

@YefeiGao Did you solve this problem?

YefeiGao commented 6 years ago

I compiled caffe2 with source code instead of using conda and it works well now @ljd16

gadcam commented 6 years ago

Looks like the Detectron support Jetson TX1, correct me if I am wrong, so I think this issue can be closed.

jpdz commented 6 years ago

@CarryJzzZ Hi, do you meet some problems when install caffe2 on TX2?