dmlc / decord

An efficient video loader for deep learning with smart shuffling that's super easy to digest
Apache License 2.0
1.83k stars 160 forks source link

GPU memory leak #27

Open ternaus opened 4 years ago

ternaus commented 4 years ago

I am decoding a list of videos with:

video = VideoReader(str(video_path), ctx=gpu(0))

frame_ids = list(range(300))

frames = video.get_batch(frame_ids).asnumpy()

on every iteration, GPU Ram consumption goes up till I get out of memory error.

leigh-plt commented 4 years ago

Without .asnumpy() memory leak exist too. I use: frames = torch.utils.dlpack.from_dlpack(video.get_batch(frame_ids).to_dlpack()) frames are located on gpu in this case

zhreshold commented 4 years ago

Can you guys post your cuda version/ decive type? I have tried on my local machine with 1070ti and cuda 10.1.243, didn't notice any mem leak.

from decord import VideoReader
from decord import cpu, gpu

video_path = '/home/joshua/Dev/decord/examples/flipping_a_pancake.mkv'

video = VideoReader(str(video_path), ctx=gpu(0))

frame_ids = list(range(300))

for i in range(100):
  frames = video.get_batch(frame_ids).asnumpy()
  if i % 10 == 0: 
    print(frames.shape)

nvidia-smi record can be viewed here: https://asciinema.org/a/xgI8tFXNlpAoDcVJdxLgag8eW GPU mem from 627M to 845M and pretty constant.

KeremTurgutlu commented 4 years ago

Thanks @leigh-plt, I modified batch loading seems to be ok in notebook environment of Kaggle after dl_pack trick:

def get_decord_video_batch(fname, sz, freq=10):
    "get batch tensor for inference, original for cropping and H,W of video"
    video = VideoReader(str(fname), ctx=gpu())
#     data = video.get_batch(range(0, len(video), 10))
    data = from_dlpack(to_dlpack(video.get_batch(range(0, len(video), 10))))
    H,W = data.shape[2:]
    del video; gc.collect()
    return (data, None, (H, W))

Although I had one successful run there had been unsuccessful runs after that. How can we fix it?

akansal1 commented 4 years ago

facing memory leak issues of CPU, on GPU working fine

yitang commented 4 years ago

facing the same issues, neither dl_pack nor asnumpy work for me.

@KeremTurgutlu i'm on the kaggle env as well.

zhreshold commented 4 years ago

Can you guys post the kaggle gpu and cuda version?

yitang commented 4 years ago

the gpu is Tesla P100-PCIE-16GB.

cuda is 10.0.130.

this is the traceback:

[16:44:39] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream.

7%|▋ | 27/400 [00:23<03:55, 1.58it/s][16:44:40] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:35: Using device: Tesla P100-PCIE-16GB

[16:44:40] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream.

7%|▋ | 28/400 [00:24<03:51, 1.61it/s][16:44:41] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:35: Using device: Tesla P100-PCIE-16GB

[16:44:41] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream.

terminate called after throwing an instance of 'dmlc::Error'

what(): [16:44:41] /kaggle/working/reader/src/video/nvcodec/cuda_threaded_decoder.cc:332: Check failed: arr.defined()

Stack trace returned 10 entries:

[bt] (0) /kaggle/working/reader/build/libdecord.so(dmlc::StackTrace[abi:cxx11](unsigned long)+0x85) [0x7f15712ee059]

[bt] (1) /kaggle/working/reader/build/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x20) [0x7f15712ee334]

[bt] (2) /kaggle/working/reader/build/libdecord.so(decord::cuda::CUThreadedDecoder::ConvertThread()+0x1a5) [0x7f157135d659]

[bt] (3) /kaggle/working/reader/build/libdecord.so(void std::__invoke_impl<void, void (decord::cuda::CUThreadedDecoder:: const&)(), decord::cuda::CUThreadedDecoder>(std::__invoke_memfun_deref, void (decord::cuda::CUThreadedDecoder:: const&)(), decord::cuda::CUThreadedDecoder&&)+0x66) [0x7f15713669bc]

[bt] (4) /kaggle/working/reader/build/libdecord.so(std::result_of<void (decord::cuda::CUThreadedDecoder:: const&(decord::cuda::CUThreadedDecoder&&))()>::type std::__invoke<void (decord::cuda::CUThreadedDecoder:: const&)(), decord::cuda::CUThreadedDecoder>(void (decord::cuda::CUThreadedDecoder:: const&)(), decord::cuda::CUThreadedDecoder&&)+0x3f) [0x7f1571366949]

[bt] (5) /kaggle/working/reader/build/libdecord.so(decltype (__invoke((this)._M_pmf, (forward<decord::cuda::CUThreadedDecoder>)({parm#1}))) std::_Mem_fn_base<void (decord::cuda::CUThreadedDecoder::)(), true>::operator()<decord::cuda::CUThreadedDecoder>(decord::cuda::CUThreadedDecoder*&&) const+0x2e) [0x7f15713668fa]

[bt] (6) /kaggle/working/reader/build/libdecord.so(void std::_Bind_simple<std::_Mem_fn<void (decord::cuda::CUThreadedDecoder::)()> (decord::cuda::CUThreadedDecoder)>::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x43) [0x7f15713668c5]

[bt] (7) /kaggle/working/reader/build/libdecord.so(std::_Bind_simple<std::_Mem_fn<void (decord::cuda::CUThreadedDecoder::)()> (decord::cuda::CUThreadedDecoder)>::operator()()+0x1d) [0x7f1571366813]

[bt] (8) /kaggle/working/reader/build/libdecord.so(std::thread::_State_impl<std::_Bind_simple<std::_Mem_fn<void (decord::cuda::CUThreadedDecoder::)()> (decord::cuda::CUThreadedDecoder)> >::_M_run()+0x1c) [0x7f15713667f2]

[bt] (9) /opt/conda/lib/python3.6/site-packages/matplotlib/../../../libstdc++.so.6(+0xb8408) [0x7f157222e408]

leigh-plt commented 4 years ago

Ubuntu 19.04 last update drivers, cuda 10.2 have leak too. 730 gtx

Kaggle: cuda_threaded_decoder.cc:35: Using device: Tesla P100-PCIE-16GB cuda_threaded_decoder.cc:55: Kernel module version 418.67, so using our own stream. OS debian stretch

KeremTurgutlu commented 4 years ago

Thanks @leigh-plt this might be helpful for inference in the kernel!

yitang commented 4 years ago

@leigh-plt are you writing a kernel on how to using video processing framwork too?

anlarro commented 4 years ago

Any updates on this? I am facing memory leak issues on CPU.

Black-Hack commented 4 years ago

It seems that deleting the frame manually avoids the leak:

vr = VideoReader(video_path)
for frame in vr:
    print(frame.shape)
    del frame

Note: I tested it for CPU only, but from the source code it seems that this would be the case for GPU as well.

huang-ziyuan commented 4 years ago

It seems that deleting the frame manually avoids the leak:

vr = VideoReader(video_path)
for frame in vr:
    print(frame.shape)
    del frame

Note: I tested it for CPU only, but from the source code it seems that this would be the case for GPU as well.

Same issue with the CPU memory leak. If you deleted the frame, how do you further process this if it is to be applied into a deep learning training framework?

Ehsan1997 commented 3 years ago

It seems that deleting the frame manually avoids the leak:

vr = VideoReader(video_path)
for frame in vr:
    print(frame.shape)
    del frame

Note: I tested it for CPU only, but from the source code it seems that this would be the case for GPU as well.

Same issue with the CPU memory leak. If you deleted the frame, how do you further process this if it is to be applied into a deep learning training framework?

Get a bunch of frames, train the model, get the next bunch.