out of memory in demo.py(pytorch)

zjbaby commented 6 years ago

I run demo.py, and I get an error: $ python3 demo.py --indir examples/demo --outdir examples/res --save_img --posebatch 1 Loading YOLO model.. THCudaCheck FAIL file=/pytorch/torch/csrc/generic/StorageSharing.cpp line=304 error=2 : out of memory Traceback (most recent call last): File "/home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/forkserver.py", line 185, in main _serve_one(s, listener, alive_r, old_handlers) File "/home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/forkserver.py", line 220, in _serve_one code = spawn._main(child_r) File "/home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/spawn.py", line 116, in _main Traceback (most recent call last): File "demo.py", line 51, in det_processor = DetectionProcessor(det_loader).start() File "/home/cyj/cyj/pose/AlphaPose_pytorch/AlphaPose/dataloader.py", line 395, in start p.start() File "/home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/process.py", line 105, in start self._popen = self._Popen(self) File "/home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/context.py", line 212, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/context.py", line 281, in _Popen return Popen(process_obj) File "/home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/popen_forkserver.py", line 36, in init self = pickle.load(from_parent) File "/home/cyj/anaconda2/envs/python3/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 96, in rebuild_storage_cuda super().init(process_obj) File "/home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/popen_fork.py", line 20, in init self._launch(process_obj) File "/home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/popen_forkserver.py", line 55, in _launch f.write(buf.getbuffer()) BrokenPipeError: [Errno 32] Broken pipe storage = cls._new_shared_cuda(device, handle, size, offset, view_size) RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/csrc/generic/StorageSharing.cpp:304 /home/cyj/anaconda2/envs/python3/lib/python3.5/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 3 leaked semaphores to clean up at shutdown len(cache))

My GPU is NVIDIA 1060, 3GB, and it works well in the old version (torch -- alphapose) when exe "$ ./run.sh --indir examples/demo/ --outdir examples/results/ --batchsize 1"

My pytorch version is 0.4.0

How to fix it?

Fang-Haoshu commented 6 years ago

Hi, in fact, we have stated in the README: " Note: If you meet OOM(out of memory) problem, decreasing the pose estimation batch until the program can run on your computer: python3 demo.py --indir ${img_directory} --outdir examples/res --posebatch 30 "

zjbaby commented 6 years ago

@Fang-Haoshu Hi, I tried, I try --posebatch 30 / 10/ 3 / 1, it still out of memory

zjbaby commented 6 years ago

@Fang-Haoshu 我觉得我的测试，还在用yolo v3检测人的时候，就已经崩了，报"out of memory"，因为我在检测人完成后加了log，但是没有打印出来。目前，只有在重启机器，第一次运行代码的时候，能成功跑通一张图片，接着再运行一次代码，或者连续跑两张图片，就会报"out of memory"的错误。我的显卡配置如下： NVIDIA 1060, 3GB

请问，这种情况怎么解决啊？或者有没有不需要那么耗资源的测试代码啊？

Fang-Haoshu commented 6 years ago

你好，你可以尝试用yolo v3旧版，不用spp版本试试

zjbaby commented 6 years ago

如何使用旧版yolo v3？

Fang-Haoshu commented 6 years ago

https://github.com/MVIG-SJTU/AlphaPose/blob/pytorch/dataloader.py#L275 和下面一行将"-spp“去掉，同时去yolo官方下载yolov3的模型

zjbaby commented 6 years ago

明白了，多谢！~~~

liushuo1201 commented 5 years ago

请问您解决这个问题了吗，我也出现这种问题了，像cfg 里的batch，subdivisions以及代码里的batch_size都修改过，还是不行，请问您是怎么解决的

MVIG-SJTU / AlphaPose

out of memory in demo.py(pytorch) #123