Closed zkailong closed 6 years ago
So I spent the better part of the day yesterday trying to get AlphaPose to compile and run inference. I finally figured out a combination that works.
https://gist.github.com/sberryman/82a6d13a44f9c4a3bfaf9263b36c92ed
./run.sh
must be relative to the CWD. Absolute paths do not work!Even if you don't use Docker you can get a very good idea of the steps I had to take to get AlphaPose running. Also, a lot of those ubuntu dependencies that are installed on line 8 can be removed. Those are left over from another project and I haven't had time to clean them up.
Your error looks more like it has to do with running out of GPU memory though. Your card (CPU) only has totalMemory: 1.95GiB freeMemory: 1.64GiB
I see RCNN using ~ 4.8GB of memory and Torch was using about 1.8GB with a batch size of 1. That is my experience running on a GTX 1080. I haven't tried my 1080 TI's yet.
Update: human-detection (tensorflow) is set to gpu_options.allow_growth=True
so I'm not sure the actual minimum memory requirements.
@sberryman Thanks for your reply. But it said
The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
So I don't think that the GPU memory of my computer is too less to run AlphaPose. And thanks for your Dockerfile. Maybe I should rebuild it.
Good luck, I know it took me a LONG time to figure out the right combination of dependencies. Hopefully the dockerfile will point you in the right direction.
Thanks @sberryman for the docker file! @zkailong From the log it seems you meet this problem: https://github.com/deepmind/torch-hdf5/issues/79, and a possible solution is to install torch with Lua5.1
@Fang-Haoshu Thanks for your reply. I reinstall torch with lua5.1. But it did not work...
Sooooo weird.... In the issue of deepmind, it seems many people also suffer from this problem..
@Fang-Haoshu So frustrated...I have send an E-mail for you. Maybe we can talk more about it.
`zhanghua@zhanghua-System-Product-Name:~/AlphaPose$ ./run.sh --indir examples/demo/ --outdir examples/results/ --vis 0 generating bbox from Faster RCNN... 2018-04-16 15:48:19.729543: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2018-04-16 15:48:20.037014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575 pciBusID: 0000:65:00.0 totalMemory: 10.90GiB freeMemory: 10.44GiB 2018-04-16 15:48:20.037044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-04-16 15:48:20.229660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-16 15:48:20.229700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-04-16 15:48:20.229705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-04-16 15:48:20.229898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10102 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) Loaded network ../output/res152/coco_2014_train+coco_2014_valminusminival/default/res152.ckpt /home/zhanghua/AlphaPose/examples/demo/
100%|█████████████████████████████████████████████| 3/3 [00:03<00:00, 1.12s/it]
pose estimation with RMPE...
Found Environment variable CUDNN_PATH = /usr/local/cuda/lib64/libcudnn.so.9.0:/usr/local/cuda-9.0/bin:/home/zhanghua/torch/install/bin:/home/zhanghua/bin:/home/zhanghua/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
/home/zhanghua/torch/install/bin/luajit: /home/zhanghua/torch/install/share/lua/5.1/trepl/init.lua:389: /home/zhanghua/torch/install/share/lua/5.1/trepl/init.lua:389: /home/zhanghua/torch/install/share/lua/5.1/cudnn/ffi.lua:1618: /usr/local/cuda/lib64/libcudnn.so.9.0:/usr/local/cuda-9.0/bin:/home/zhanghua/torch/install/bin:/home/zhanghua/bin:/home/zhanghua/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin: cannot open shared object file: No such file or directory
stack traceback:
[C]: in function 'error'
/home/zhanghua/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
/home/zhanghua/AlphaPose/predict/util.lua:12: in main chunk
[C]: in function 'dofile'
main-alpha-pose.lua:7: in main chunk
[C]: in function 'dofile'
...ghua/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
Traceback (most recent call last):
File "parametric-pose-nms-MPII.py", line 256, in
This is my problem. Who can help me?thanks
Environment:ubuntu 16.04;cuda 9.0.176;cuDNN 7.0.5;TensorFlow 1.6.0(gpu). Reference to #10 #3, I've been installed torch,lucrocks,hdf5,etc...But there are still problems running...
So,how can I solve it?