facebookresearch / VMZ

VMZ: Model Zoo for Video Modeling
Apache License 2.0
1.04k stars 156 forks source link

Prefetching error when finetuning #73

Closed surajm72 closed 5 years ago

surajm72 commented 5 years ago

Hello, trying to run the finetuning code on an existing model, but I'm getting this error. I am somewhat new to caffe so any help is appreciated!!

Also, is there a way to run the finetuning without a GPU?

[E prefetch_op.h:110] Prefetching error std::bad_alloc [E prefetch_op.h:83] Prefetching failed. [E net_async_base.cc:377] Failed to execute an op, op VideoInput [E net_async_base.cc:135] Error encountered in the run of 'r2plus1d_train' WARNING:caffe2.python.workspace:Original python traceback for operator 0 in network r2plus1d_train in exception above (most recent call last): WARNING:caffe2.python.workspace: File "tools/train_net.py", line 501, in WARNING:caffe2.python.workspace: File "tools/train_net.py", line 496, in main WARNING:caffe2.python.workspace: File "tools/train_net.py", line 280, in Train WARNING:caffe2.python.workspace: File "/home/surajm72/pytorch/build/caffe2/python/data_parallel_model.py", line 34, in Parallelize_GPU WARNING:caffe2.python.workspace: File "/home/surajm72/pytorch/build/caffe2/python/data_parallel_model.py", line 231, in Parallelize WARNING:caffe2.python.workspace: File "tools/train_net.py", line 268, in add_video_input WARNING:caffe2.python.workspace: File "/home/surajm72/VMZ/lib/utils/model_helper.py", line 131, in AddVideoInput Traceback (most recent call last): File "tools/train_net.py", line 501, in main() File "tools/train_net.py", line 496, in main Train(args) File "tools/train_net.py", line 388, in Train explog File "tools/train_net.py", line 123, in RunEpoch workspace.RunNet(train_model.net.Proto().name) File "/home/surajm72/pytorch/build/caffe2/python/workspace.py", line 250, in RunNet StringifyNetName(name), num_iter, allow_fail, File "/home/surajm72/pytorch/build/caffe2/python/workspace.py", line 211, in CallWithExceptionIntercept return func(*args, *kwargs) RuntimeError: [enforce fail at pybind_state.cc:1188] success. Error running net r2plus1d_train frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const) + 0x76 (0x7ff7a7a78316 in /home/surajm72/pytorch/build/lib/libc10.so) frame #1: + 0x46046 (0x7ff7a816b046 in /home/surajm72/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so) frame #2: + 0x91360 (0x7ff7a81b6360 in /home/surajm72/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)

frame #15: python() [0x4eb69f] frame #19: __libc_start_main + 0xf0 (0x7ff7aba5c830 in /lib/x86_64-linux-gnu/libc.so.6)
dutran commented 5 years ago

should be fixed by https://github.com/facebookresearch/VMZ/issues/69