CMU-Perceptual-Computing-Lab / openpose

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
https://cmu-perceptual-computing-lab.github.io/openpose
Other
30.99k stars 7.84k forks source link

Jetson TX1 - pooling_layer.cu:212] Check failed: error == cudaSuccess (8 vs. 0) invalid device function #58

Closed aidanboran closed 7 years ago

aidanboran commented 7 years ago

Issue summary

Executed command (if any)

a) build/examples/openpose/openpose.bin --image_dir /home/ubuntu/Dev/openpose/examples/media (gives the error below)

b) build/examples/openpose/openpose.bin --no_gpu 0 --image_dir /home/ubuntu/Dev/openpose/examples/media (open window, displays images but no recognitions made.)

Type of issue

You might select multiple topics, delete the rest:

Your system configuration

Operating system (lsb_release -a on Ubuntu): No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04 LTS Release: 16.04 Codename: xenial

CUDA version (cat /usr/local/cuda/version.txt in most cases): CUDA Version 8.0.34

Caffe version: Default from OpenPose

OpenCV version: 2.4 installed from JetPack 3.0.

build/examples/openpose/openpose.bin --image_dir /home/ubuntu/Dev/openpose/examples/media Starting pose estimation demo. Starting thread(s) F0608 00:53:13.197923 29939 pooling_layer.cu:212] Check failed: error == cudaSuccess (8 vs. 0) invalid device function *** Check failure stack trace: *** @ 0x7f935c6718 google::LogMessage::Fail() @ 0x7f935c8614 google::LogMessage::SendToLog() @ 0x7f935c6290 google::LogMessage::Flush() @ 0x7f935c8eb4 google::LogMessageFatal::~LogMessageFatal() @ 0x7f92b7ef40 caffe::PoolingLayer<>::Forward_gpu() @ 0x7f92a085b0 caffe::Net<>::ForwardFromTo() @ 0x7f936873dc op::NetCaffe::forwardPass() @ 0x7f936ee710 op::PoseExtractorCaffe::forwardPass() @ 0x7f936fa274 op::WPoseExtractor<>::work() @ 0x7f93719c2c op::Worker<>::checkAndWork() @ 0x7f9371ce98 op::SubThread<>::workTWorkers() @ 0x7f937261e4 op::SubThreadQueueInOut<>::work() @ 0x7f93721df0 op::Thread<>::threadFunction() @ 0x7f934b6280 (unknown) @ 0x7f91fadfb4 start_thread Aborted

gineshidalgo99 commented 7 years ago

Hi, 2 quick questions:

  1. Which cuDNN version are you using?
  2. --no_gpu 0? There is no such an option. I guess you meant --num_gpu. For that one, you need at least 1 GPU: --num_gpu 1. Thanks!
aidanboran commented 7 years ago

Thanks.

Here is the output of a program with the cudnn version....

$ ./mnistCUDNN cudnnGetVersion() : 5105 , CUDNN_VERSION from cudnn.h : 5105 (5.1.5) Host compiler version : GCC 4.9.2 There are 1 CUDA capable devices on your machine : device 0 : sms 2 Capabilities 5.3, SmClock 72.0 Mhz, MemSize (Mb) 3994, MemClock 12.8 Mhz, Ecc=0, boardGroupID=0 Using device 0

On the --num_gpu 0, I was just playing to see if I could get the program to do something !

gineshidalgo99 commented 7 years ago

I am slightly confused, it is then working with --num_gpu 1 so that this issue can be closed? Or what is the output when --num_gpu 1 is used? Thanks

aidanboran commented 7 years ago

It does not work with either --num_gpu setting.

With --num_gpu=1, I get the "Check failed: error == cudaSuccess (8 vs. 0) invalid device function" [I assume this is the correct way to enable gpu]

With --num_gpu=0, the program finishes without any errors but does not detect anything in the samples images. [I was just playing to see if I could get the program to run at all]

gineshidalgo99 commented 7 years ago

OK got it.

Since you are using a custom Ubuntu (the one from Nvidia), we cannot give you too much more help for the Caffe part (where it is failing), since we do not have that device to try.

Try to run Caffe and some Caffe demo (maybe the Caffe tests) there. Once Caffe is working with the GPU, OpenPose just uses C++11, Caffe and Caffe's dependencies.

Let us know your results. Thanks

aidanboran commented 7 years ago

Ok. Caffe is working fine for all its tests and at least some demos. But let me run in a debugger to see what is actually failing. My guess is some issue with version mismatch between caffe/cuda/cudnn/jetson

Do you know if anyone else has got it working on Jetson ?

gineshidalgo99 commented 7 years ago

Yeah please, let me know the exact function where it fails, so I can make more guesses about OpenPose.

No idea about people using OpenPose on Jetson.

aidanboran commented 7 years ago

I now have "openpose.bin" running. I needed to change some of the CUDA arch params in Makefile.config for Jetson Tx1 However, I still do not see any useful or interesting output:

aidanboran commented 7 years ago

Finally have it working on the Jetson TK1.... I needed to fix a few issues with the build files for caffe and openpose as follows:

My CUDA_ARCH settings: CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_53,code=sm_53

INCLUDE_DIRS := /usr/local/include /usr/include/hdf5/serial LIBRARY_DIRS := /usr/local/lib /usr/lib /usr/lib/aarch64-linux-gnu/hdf5/serial

I also forced some flags in the Makefile (this may not be neccassary but its late and I'm tired so not doing anymore as its working for me) -DCUDA_ARCH_NAME="Manual" -DCUDA_ARCH_BIN="53" -DCUDA_ARCH_PTX="53" -DUSE_CUDNN=1

I also build using the latest openpose src.

gineshidalgo99 commented 7 years ago

Thank you for posting the solution! So other people can use it too.

In conclusion, the only changes were located in the Makefile and Makefile.config files. This is good, so you and Jetson users will be able to easily update OpenPose at any point.

I am closing this issue then.

IoaSman1 commented 7 years ago

I am curious to know if cortinas finally achieved to run open pose on jetson TK1 !!

Cortinas could you please email me at smanismech[at]me[dot]com .

I have a jetson TX2 and I have some memory outage issue when i run openpose on it.

Thank you in advance

York-Cheung commented 7 years ago

@cortinas Hi, I am trying to run OpenPose on my Jetson TK1. And I've tried the method you gave above. I edited the file Makefile.config in the 3rdparty/caffe/, changed the CUDA_ARCH settings and added NCLUDE_DIRS := /usr/local/include /usr/include/hdf5/serial LIBRARY_DIRS := /usr/local/lib /usr/lib /usr/lib/aarch64-linux-gnu/hdf5/serial then I ran make all -j4 && make distribute -j4 to build. But I got ERROR:

NVCC src/caffe/solvers/adadelta_solver.cu
nvcc fatal   : Unsupported gpu architecture 'compute_53'
make: *** [.build_release/cuda/src/caffe/solvers/adadelta_solver.o] Error 1
make: *** Waiting for unfinished jobs....

Is there anything I did wrong? My CUDA version is 6.5 thx.

York-Cheung commented 7 years ago

Hello?

IoaSman1 commented 7 years ago

Don't even try it.

On Jetson TX2 with jetpack 3.1 I get 1FPS performance for prerecorded video or realtime . I don't think it worths to run it on TK1. It is GPU hungry model !!

York-Cheung commented 7 years ago

@IoaSman1 Have you test how much time dose OpenPose process one image?

gineshidalgo99 commented 7 years ago

Even using the tips in the FAQ (but it'll decrease accuracy) in tge doc/installation file is that slow?

SkyKingCoversGroundTiger commented 7 years ago

@IoaSman1 I am doing the same thing here with Jetson Tx2. Is it straightforward to make the whole thing work? Would appreciate very much if you can share the steps...

gineshidalgo99 commented 7 years ago

If someone wants to share the steps, feel free to make a pull request with the steps for any other OS or embedded board! I'll merge it. Thanks!

SkyKingCoversGroundTiger commented 7 years ago

awesome. Looking forward to that! Thanks!

On Sep 10, 2017, at 2:52 PM, Gines notifications@github.com wrote:

If someone wants to share the steps, feel free to make a pull request with the steps for any other OS or embedded board! I'll merge it. Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/58#issuecomment-328374676, or mute the thread https://github.com/notifications/unsubscribe-auth/AWBRGcvw1-EHbvpqkZHpZLrqTy_y3tOIks5shFocgaJpZM4NzdKD.

bushibushi commented 7 years ago

Got it working on TX2 last night, PR incoming. With loads of reduction (128x96) in net_resolution I got to 10+fps. Used external webcam as it wasn't straightforward with the board one. Hands and Face work (256x256 nets) but both at the same time is too memory intensive, it oom crashes.

After I finish the PR I'll take a look at TensorRT hoping for higher realtime performances.

bushibushi commented 7 years ago

https://github.com/CMU-Perceptual-Computing-Lab/openpose/pull/245

vinitmuchhala commented 6 years ago

@IoaSman1 have you tried reducing the net_resolution, I can push it up to 4-7 fps based on how low I am willing to go on net_resolution, the accuracy drop is not significant too Hope this helps

ghost commented 5 years ago

Hello Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0. F0123 10:55:27.467897 13141 pooling_layer.cu:212] Check failed: error == cudaSuccess (48 vs. 0) no kernel image is available for execution on the device Check failure stack trace: @ 0x7f92b39718 google::LogMessage::Fail() @ 0x7f92b3b614 google::LogMessage::SendToLog() @ 0x7f92b39290 google::LogMessage::Flush() @ 0x7f92b3beb4 google::LogMessageFatal::~LogMessageFatal() @ 0x7f92f40bc8 caffe::PoolingLayer<>::Forward_gpu() @ 0x7f92d66058 caffe::Net<>::ForwardFromTo() @ 0x7f93e68a2c op::NetCaffe::forwardPass() @ 0x7f93f9897c op::PoseExtractorCaffe::forwardPass() @ 0x7f93f8e178 op::PoseExtractor::forwardPass() @ 0x7f93f9cc18 op::WPoseExtractor<>::work() @ 0x7f93e96bac op::Worker<>::checkAndWork() @ 0x7f93e9b528 op::SubThread<>::workTWorkers() @ 0x7f93ea57cc op::SubThreadQueueInOut<>::work() @ 0x7f93ea1308 op::Thread<>::threadFunction() @ 0x7f9394f280 (unknown) @ 0x7f91f77fc4 start_thread Aborted facing the above error with TX1. tried the changes mentioned above. Please guide here.