CMU-Perceptual-Computing-Lab / caffe_rtpose

Realtime C++ code for multi-person pose estimation
Other
356 stars 207 forks source link

Check failed: error == cudaSuccess (2 vs. 0) out of memory #44

Closed Billfortme closed 7 years ago

Billfortme commented 7 years ago

I got one board Jetson TX2. I am trying to compile and run rtpose to see how it performs on Jetson TX2. I compiled with no problem. However, When I run it. I got the following error: F0607 11:47:28.814931 13654 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory Check failure stack trace: @ 0x7f9a445718 google::LogMessage::Fail() @ 0x7f9a447614 google::LogMessage::SendToLog() @ 0x7f9a445290 google::LogMessage::Flush() @ 0x7f9a447eb4 google::LogMessageFatal::~LogMessageFatal() @ 0x7f9a6bae2c caffe::SyncedMemory::to_gpu() @ 0x7f9a6b9eb4 caffe::SyncedMemory::mutable_gpu_data() @ 0x7f9a6c94ac caffe::Blob<>::mutable_gpu_data() @ 0x7f9a8d9f6c caffe::CuDNNConvolutionLayer<>::Forward_gpu() @ 0x7f9a87dcf8 caffe::Net<>::ForwardFromTo() @ 0x40afb0 warmup() @ 0x40f72c processFrame() @ 0x7f99a32fc4 start_thread Aborted (core dumped) Some one can help me out?

gineshidalgo99 commented 7 years ago

Without cuDNN rt_pose needs >12 GB. You need cuDNN enabled. With cuDNN enabled, rt_pose still uses > 2GB GPU memory. Check the new library to reduce it to ~1300-1500 MB: https://github.com/CMU-Perceptual-Computing-Lab/openpose/

airobots commented 6 years ago

I believe my cuDNN is enabled. How do you make sure it is enabled? I can successfully compiled the new library, but I still get the same error when I run one of the examples: ./build/examples/openpose/openpose.bin --video examples/media/video.avi

gineshidalgo99 commented 6 years ago

cuDNN is enabled if OpenPose uses less than 2GB or GPU memory (check with watch -n 1.0 nvidia-smi)

OsamaMazhar commented 6 years ago

Hi,

I have installed openpose (the new library) and when I run an example openpose.bin I get the exact same error. I have cudnn 6 installed (and enabled, checked by cmake .. in caffe/build folder) with cuda 8.0. I am using ubuntu 16.04. Caffe built from source and linked with openpose as stated in the installation guide. Please help.

gineshidalgo99 commented 6 years ago

GPU model and memory of your GPU? watch -n 1.0 nvidia-smi

OsamaMazhar commented 6 years ago

Hi, This is the output of nvidia-smi. This command didn't work sorry "watch -n 1.0 nvidia-smi"

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.111 Driver Version: 384.111 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 00000000:03:00.0 On | N/A | | 27% 34C P8 11W / 180W | 466MiB / 8112MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 00000000:04:00.0 Off | N/A | | 27% 29C P8 6W / 180W | 2MiB / 8114MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1221 G /usr/lib/xorg/Xorg 231MiB | | 0 2108 G compiz 127MiB | | 0 3653 G ...-token=713A76200FFBD4061C4285C11D0AF8FF 105MiB | +-----------------------------------------------------------------------------+

gineshidalgo99 commented 6 years ago

and the command used? It can't run out of memory with the basic command in the quick_start and cuDNN enabled. Most probably, you are using a different CUDA version or sth (e.g. having more than one in ls /usr/local/ )

OsamaMazhar commented 6 years ago

I built the openpose library with the Makefile given (renaming the one with cuda 8 and ubuntu 16) and ran the openpose.bin example from the root of openpose folder. I only have one cuda installed i.e., cuda 8. In /usr/local/ I have two cuda folders one is cuda-8.0 and the other cuda that simply directs to the same cuda-8.0 folder. I built caffe from sources downloaded from github.

The examples tutorial_pose works fine (both of them)

Gjain234 commented 6 years ago

How did you end up figuring this out? I am trying to install openpose and am fairly sure I am using CUDNN because it 'make -jproc' gives this summary:

-- GCC detected, adding compile flags -- Building with CUDA. -- CUDA detected: 8.0 -- Found cuDNN: ver. 5.1.10 found (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libcudnn.so) -- Added CUDA NVCC flags for: sm_50 -- Found cuDNN: ver. 5.1.10 found (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libcudnn.so) -- Found gflags (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so) -- Found glog (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so) -- Caffe will be downloaded from source now. NOTE: This process might take several minutes depending on your internet connection. -- Caffe has already been downloaded.

However I get the same failure message (Check failed: error == cudaSuccess (2 vs. 0) out of memory) when I run any of the commands in the openpose quickstart section.

fmnoori commented 5 years ago

Hi, @gineshidalgo99 apologize for the silly question. i am a beginner.

I am using Ubuntu 16.04 Cuda 8. CudNN5.1,. When I run this code (Realtime_Multi-Person_Pose_Estimation-master)

I am getting this error

22:58:24.641180 2683 net.cpp:202] conv2_2 does not need backward computation. I0830 22:58:24.641196 2683 net.cpp:202] relu2_1 does not need backward computation. I0830 22:58:24.641201 2683 net.cpp:202] conv2_1 does not need backward computation. I0830 22:58:24.641203 2683 net.cpp:202] pool1_stage1 does not need backward computation. I0830 22:58:24.641207 2683 net.cpp:202] relu1_2 does not need backward computation. I0830 22:58:24.641211 2683 net.cpp:202] conv1_2 does not need backward computation. I0830 22:58:24.641213 2683 net.cpp:202] relu1_1 does not need backward computation. I0830 22:58:24.641216 2683 net.cpp:202] conv1_1 does not need backward computation. I0830 22:58:24.641219 2683 net.cpp:202] input does not need backward computation. I0830 22:58:24.641222 2683 net.cpp:244] This network produces output Mconv7_stage6_L1 I0830 22:58:24.641239 2683 net.cpp:244] This network produces output Mconv7_stage6_L2 I0830 22:58:24.641326 2683 net.cpp:257] Network initialization done. I0830 22:58:24.747468 2683 net.cpp:746] Ignoring source layer data I0830 22:58:24.747484 2683 net.cpp:746] Ignoring source layer vec_weight I0830 22:58:24.747488 2683 net.cpp:746] Ignoring source layer vec_weight_vec_weight_0_split I0830 22:58:24.747489 2683 net.cpp:746] Ignoring source layer heat_weight_vec_weight_1_split I0830 22:58:24.747491 2683 net.cpp:746] Ignoring source layer label_vec I0830 22:58:24.747494 2683 net.cpp:746] Ignoring source layer label_vec_label_vec_0_split I0830 22:58:24.747496 2683 net.cpp:746] Ignoring source layer label_heat I0830 22:58:24.747498 2683 net.cpp:746] Ignoring source layer label_heat_label_heat_0_split I0830 22:58:24.747514 2683 net.cpp:746] Ignoring source layer image I0830 22:58:24.747517 2683 net.cpp:746] Ignoring source layer silence2 I0830 22:58:24.755187 2683 net.cpp:746] Ignoring source layer conv5_5_CPM_L1_conv5_5_CPM_L1_0_split I0830 22:58:24.755208 2683 net.cpp:746] Ignoring source layer conv5_5_CPM_L2_conv5_5_CPM_L2_0_split I0830 22:58:24.755214 2683 net.cpp:746] Ignoring source layer weight_stage1_L1 I0830 22:58:24.755216 2683 net.cpp:746] Ignoring source layer loss_stage1_L1 I0830 22:58:24.755219 2683 net.cpp:746] Ignoring source layer weight_stage1_L2 I0830 22:58:24.755223 2683 net.cpp:746] Ignoring source layer loss_stage1_L2 I0830 22:58:24.763743 2683 net.cpp:746] Ignoring source layer Mconv7_stage2_L1_Mconv7_stage2_L1_0_split I0830 22:58:24.763759 2683 net.cpp:746] Ignoring source layer Mconv7_stage2_L2_Mconv7_stage2_L2_0_split I0830 22:58:24.763765 2683 net.cpp:746] Ignoring source layer weight_stage2_L1 I0830 22:58:24.763768 2683 net.cpp:746] Ignoring source layer loss_stage2_L1 I0830 22:58:24.763772 2683 net.cpp:746] Ignoring source layer weight_stage2_L2 I0830 22:58:24.763775 2683 net.cpp:746] Ignoring source layer loss_stage2_L2 I0830 22:58:24.771598 2683 net.cpp:746] Ignoring source layer Mconv7_stage3_L1_Mconv7_stage3_L1_0_split I0830 22:58:24.771611 2683 net.cpp:746] Ignoring source layer Mconv7_stage3_L2_Mconv7_stage3_L2_0_split I0830 22:58:24.771618 2683 net.cpp:746] Ignoring source layer weight_stage3_L1 I0830 22:58:24.771621 2683 net.cpp:746] Ignoring source layer loss_stage3_L1 I0830 22:58:24.771625 2683 net.cpp:746] Ignoring source layer weight_stage3_L2 I0830 22:58:24.771628 2683 net.cpp:746] Ignoring source layer loss_stage3_L2 I0830 22:58:24.779940 2683 net.cpp:746] Ignoring source layer Mconv7_stage4_L1_Mconv7_stage4_L1_0_split I0830 22:58:24.779954 2683 net.cpp:746] Ignoring source layer Mconv7_stage4_L2_Mconv7_stage4_L2_0_split I0830 22:58:24.779960 2683 net.cpp:746] Ignoring source layer weight_stage4_L1 I0830 22:58:24.779963 2683 net.cpp:746] Ignoring source layer loss_stage4_L1 I0830 22:58:24.779968 2683 net.cpp:746] Ignoring source layer weight_stage4_L2 I0830 22:58:24.779970 2683 net.cpp:746] Ignoring source layer loss_stage4_L2 I0830 22:58:24.787822 2683 net.cpp:746] Ignoring source layer Mconv7_stage5_L1_Mconv7_stage5_L1_0_split I0830 22:58:24.787835 2683 net.cpp:746] Ignoring source layer Mconv7_stage5_L2_Mconv7_stage5_L2_0_split I0830 22:58:24.787840 2683 net.cpp:746] Ignoring source layer weight_stage5_L1 I0830 22:58:24.787843 2683 net.cpp:746] Ignoring source layer loss_stage5_L1 I0830 22:58:24.787847 2683 net.cpp:746] Ignoring source layer weight_stage5_L2 I0830 22:58:24.787850 2683 net.cpp:746] Ignoring source layer loss_stage5_L2 I0830 22:58:24.795758 2683 net.cpp:746] Ignoring source layer weight_stage6_L1 I0830 22:58:24.795764 2683 net.cpp:746] Ignoring source layer loss_stage6_L1 I0830 22:58:24.795768 2683 net.cpp:746] Ignoring source layer weight_stage6_L2 I0830 22:58:24.795771 2683 net.cpp:746] Ignoring source layer loss_stage6_L2 F0830 22:58:44.724848 2683 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory Check failure stack trace: [I 22:58:46.684 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports WARNING:root:kernel 4565bd80-4f89-4eaf-80d5-32a2f3596379 restarted [I 22:59:01.796 NotebookApp] Saving file at /demo.ipynb [W 22:59:01.797 NotebookApp] Notebook demo.ipynb is not trusted

looked for several solutions but unable to get fix it.


Also, this command is not working

farzan@farzan-OptiPlex-3050:~$ watch -n 1.0 nvidia-smi watch: failed to parse argument: '1.0'

lauer2356 commented 5 years ago

I am getting a similar error. I believe I have correctly installed Cuda 8 and Cudnn 5.1. When I run watch -n 1.0 nvidia-smi I see a line pop up for openpose at around 950mb for a breif time then that line goes away when the error message pops up.

Any idea of where to look?

Error

F0914 10:18:10.793766 22234 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory Check failure stack trace: @ 0x7f195ca6b0cd google::LogMessage::Fail() @ 0x7f195ca6cf33 google::LogMessage::SendToLog() @ 0x7f195ca6ac28 google::LogMessage::Flush() @ 0x7f195ca6d999 google::LogMessageFatal::~LogMessageFatal() @ 0x7f195d3af6e8 caffe::SyncedMemory::mutable_gpu_data() @ 0x7f195d222e22 caffe::Blob<>::mutable_gpu_data() @ 0x7f195d3faf28 caffe::CuDNNConvolutionLayer<>::Forward_gpu() @ 0x7f195d3752c1 caffe::Net<>::ForwardFromTo() @ 0x7f195d9ec127 op::NetCaffe::forwardPass() @ 0x7f195da2baea op::PoseExtractorCaffe::forwardPass() @ 0x7f195de3fc82 OpenPose::forward() @ 0x7f195de3d286 forward @ 0x7f1990b42ec0 ffi_call_unix64 @ 0x7f1990b4287d ffi_call @ 0x7f1990d57dae _ctypes_callproc @ 0x7f1990d587e5 PyCFuncPtr_call @ 0x5641a2c16bcb _PyObject_FastCallDict @ 0x5641a2ca3f4e call_function @ 0x5641a2cc894a _PyEval_EvalFrameDefault @ 0x5641a2c9d206 _PyEval_EvalCodeWithName @ 0x5641a2c9e1cf fast_function @ 0x5641a2ca3ed5 call_function @ 0x5641a2cc894a _PyEval_EvalFrameDefault @ 0x5641a2c9ecb9 PyEval_EvalCodeEx @ 0x5641a2c9fa4c PyEval_EvalCode @ 0x5641a2cc637b builtin_exec @ 0x5641a2c16921 _PyCFunction_FastCallDict @ 0x5641a2ca3dfc call_function @ 0x5641a2cc894a _PyEval_EvalFrameDefault @ 0x5641a2c9d206 _PyEval_EvalCodeWithName @ 0x5641a2c9e1cf fast_function @ 0x5641a2ca3ed5 call_function

Every 1.0s: nvidia-smi elau432-Precision-M4800: Fri Sep 14 10:20:24 2018

Fri Sep 14 10:20:24 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.48 Driver Version: 390.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro K1100M Off | 00000000:01:00.0 On | N/A | | N/A 50C P0 N/A / N/A | 782MiB / 1999MiB | 5% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1172 G /usr/lib/xorg/Xorg 325MiB | | 0 1430 G /usr/bin/gnome-shell 219MiB | | 0 2031 G ...quest-channel-token=7072637909516120156 121MiB | | 0 6712 C /usr/lib/libreoffice/program/soffice.bin 13MiB | | 0 14821 G ...CorrectRendering --no-sandbox --support 83MiB | +-----------------------------------------------------------------------------+

gineshidalgo99 commented 5 years ago

From the doc: The model BODY_25 requires 2.5GB of memory, you have only 2GB seeing your output: | 0 Quadro K1100M Off | 00000000:01:00.0 On | N/A | | N/A 50C P0 N/A / N/A | 782MiB / 1999MiB | 5% Default |

You can use COCO or MPII models, which requires less memory, or you can use BODY_25 with lower net_resolution

lauer2356 commented 5 years ago

Ok, I see, thanks.

Turning down the net_resolution did make it work, I had to go to about -1x160 for it to run. Is that about what you would expect?

gineshidalgo99 commented 5 years ago

You can make it higher by making sure there is no other tasks running on the GPU, I see already half of your GPU is full of other tasks (e.g. right after rebooting the PC there are much less tasks)

lauer2356 commented 5 years ago

Yea, I noticed a lot of other stuff was running. I'll give it a shot after a reboot.

This project is great. Thank you to and the team for all the work you've done to develop and support it!

ccl-private commented 3 years ago

for me, my Cuda is installed via apt, it's in "/usr/lib/cuda". So I need to point out my cudnn path in ./cmake/Moudules/FindCuDNN.cmake.

find_path(CUDNN_INCLUDE cudnn.h PATHS ${CUDNN_ROOT} $ENV{CUDNN_ROOT} ${CUDA_TOOLKIT_INCLUDE} /usr/lib/cuda/include DOC "Path to cuDNN include directory." )

get_filename_component(libpath_hist ${CUDA_CUDART_LIBRARY} PATH) find_library(CUDNN_LIBRARY NAMES ${CUDNN_LIB_NAME} PATHS ${CUDNN_ROOT} $ENV{CUDNN_ROOT} ${CUDNN_INCLUDE} ${libpath_hist} ${__libpath_hist}/../lib /usr/lib/cuda/lib64 DOC "Path to cuDNN library.")

HamenderSingh commented 3 years ago

For me other processes were eating up GPU so I used watch -n 1.0 nvidia-smi to find PID's and kill -9 <pid> to remove unwanted processes. After that it worked without any issue.