MVIG-SJTU / AlphaPose

Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
http://mvig.org/research/alphapose.html
Other
8.07k stars 1.98k forks source link

cuda runtime error (2) : out of memory #53

Closed arndey closed 6 years ago

arndey commented 6 years ago

Hi, what I need to do with it? I saw issues with the same error, but they didn't help me

~/Projects/AlphaPose$ ./run.sh --indir examples/demo/ --outdir examples/results/ --mode fast --sep
0
generating bbox from Faster RCNN...
/home/andrey/anaconda/envs/tfenv27/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /home/andrey/anaconda/envs/tfenv27/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
2018-04-24 23:29:18.930478: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-04-24 23:29:19.065258: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-24 23:29:19.065580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.5185
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.41GiB
2018-04-24 23:29:19.065607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-24 23:29:19.804746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-24 23:29:19.804794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-24 23:29:19.804803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-24 23:29:19.804975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1177 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
Loaded network ../output/res152/coco_2014_train+coco_2014_valminusminival/default/res152.ckpt
/home/andrey/Projects/AlphaPose/examples/demo/

  0%|                                                      | 0/3 [00:00<?, ?it/s]2018-04-24 23:29:30.619218: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-24 23:29:30.631295: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-24 23:29:30.692789: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.04GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-24 23:29:30.714168: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-24 23:29:30.774058: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-24 23:29:31.098338: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1006.62MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-24 23:29:31.327891: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.79GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-24 23:29:31.490841: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 922.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-24 23:29:31.649115: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.58GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-04-24 23:29:31.742389: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.57GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
100%|██████████████████████████████████████████████| 3/3 [00:05<00:00,  1.73s/it]
pose estimation with RMPE...
Found Environment variable CUDNN_PATH = /usr/local/cuda/lib64/libcudnn.so.5MPII 
Found Environment variable CUDNN_PATH = /usr/local/cuda/lib64/libcudnn.so.5THCudaCheck FAIL file=/home/andrey/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/home/andrey/torch/install/bin/luajit: /home/andrey/.luarocks/share/lua/5.1/nn/CAddTable.lua:13: cuda runtime error (2) : out of memory at /home/andrey/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
    [C]: in function 'resizeAs'
    /home/andrey/.luarocks/share/lua/5.1/nn/CAddTable.lua:13: in function 'func'
    /home/andrey/.luarocks/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
    /home/andrey/.luarocks/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
    main-alpha-pose.lua:117: in function 'loop'
    main-alpha-pose.lua:176: in main chunk
    [C]: in function 'dofile'
    ...drey/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x55948c91c710
/home/andrey/anaconda/envs/tfenv27/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "parametric-pose-nms-MPII.py", line 256, in <module>
    get_result_json(args)
  File "parametric-pose-nms-MPII.py", line 243, in get_result_json
    test_parametric_pose_NMS_json(delta1, delta2, mu, gamma,args.outputpath)
  File "parametric-pose-nms-MPII.py", line 99, in test_parametric_pose_NMS_json
    h5file = h5py.File(os.path.join(outputpath,"POSE/test-pose.h5"), 'r')
  File "/home/andrey/anaconda/envs/tfenv27/lib/python2.7/site-packages/h5py/_hl/files.py", line 269, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/andrey/anaconda/envs/tfenv27/lib/python2.7/site-packages/h5py/_hl/files.py", line 99, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (unable to open file: name = '/home/andrey/Projects/AlphaPose/examples/results/POSE/test-pose.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Fang-Haoshu commented 6 years ago

Hi, you need a GPU with at least 4GB memory

arndey commented 6 years ago

Can I somehow run on gtx1050 with 2gb?

Fang-Haoshu commented 6 years ago

Hi, maybe you can try add the flag --batch 1