facebookresearch / VMZ

VMZ: Model Zoo for Video Modeling
Apache License 2.0
1.04k stars 156 forks source link

Error when run scripts/test_irCSN_152_kinetics.sh: input.numel() > 0 #112

Closed huangjun12 closed 4 years ago

huangjun12 commented 4 years ago

[E net_async_base.cc:377] [enforce fail at conv_pool_op_base.h:219] input.numel() > 0. Error from operator: input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernels" ints: 3 ints: 7 ints: 7 } arg { name: "exhaustive_search" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "pads" ints: 1 ints: 3 ints: 3 ints: 1 ints: 3 ints: 3 } arg { name: "strides" ints: 1 ints: 2 ints: 2 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x76 (0x7f40b7d73bb6 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: <unknown function> + 0x11c1a78 (0x7f40687b7a78 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #2: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x122 (0x7f40687b8092 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: caffe2::CudnnConvOp::RunOnDevice() + 0x1e0 (0x7f40687a62f0 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: <unknown function> + 0x10e6f75 (0x7f40686dcf75 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x154 (0x7f40b1252934 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #6: <unknown function> + 0x1471745 (0x7f40b125a745 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x2cb (0x7f40b03087bb in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #8: <unknown function> + 0xb8c80 (0x7f4077e7fc80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) frame #9: <unknown function> + 0x76ba (0x7f40bce7e6ba in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x6d (0x7f40bcbb441d in /lib/x86_64-linux-gnu/libc.so.6) , op Conv [E net_async_base.cc:129] Rethrowing exception from the run of 'video_model' WARNING:caffe2.python.workspace:Original python traceback for operator0in networkvideo_modelin exception above (most recent call last): WARNING:caffe2.python.workspace: File "tools/test_net_large.py", line 492, in <module> WARNING:caffe2.python.workspace: File "tools/test_net_large.py", line 487, in main WARNING:caffe2.python.workspace: File "tools/test_net_large.py", line 174, in Test WARNING:caffe2.python.workspace: File "/usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/data_parallel_model.py", line 34, in Parallelize_GPU WARNING:caffe2.python.workspace: File "/usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/data_parallel_model.py", line 220, in Parallelize WARNING:caffe2.python.workspace: File "tools/test_net_large.py", line 125, in create_model_ops WARNING:caffe2.python.workspace: File "/workspace/ssd6/huangjun12/Caffe2Project/VMZ-train/caffe2/lib/models/model_builder.py", line 129, in build_model WARNING:caffe2.python.workspace: File "/workspace/ssd6/huangjun12/Caffe2Project/VMZ-train/caffe2/lib/models/r3d_model.py", line 179, in create_model WARNING:caffe2.python.workspace: File "/workspace/ssd6/huangjun12/Caffe2Project/VMZ-train/caffe2/lib/models/r3d_model.py", line 230, in create_r3d WARNING:caffe2.python.workspace: File "/usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/cnn.py", line 86, in ConvNd WARNING:caffe2.python.workspace: File "/usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/brew.py", line 107, in scope_wrapper WARNING:caffe2.python.workspace: File "/usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 164, in conv_nd WARNING:caffe2.python.workspace: File "/usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 123, in _ConvBase Traceback (most recent call last): File "tools/test_net_large.py", line 492, in <module> main() File "tools/test_net_large.py", line 487, in main Test(args) File "tools/test_net_large.py", line 279, in Test workspace.RunNet(test_model.net.Proto().name) File "/usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/workspace.py", line 236, in RunNet StringifyNetName(name), num_iter, allow_fail, File "/usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/workspace.py", line 197, in CallWithExceptionIntercept return func(*args, **kwargs) RuntimeError: [enforce fail at conv_pool_op_base.h:219] input.numel() > 0. Error from operator: input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernels" ints: 3 ints: 7 ints: 7 } arg { name: "exhaustive_search" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "pads" ints: 1 ints: 3 ints: 3 ints: 1 ints: 3 ints: 3 } arg { name: "strides" ints: 1 ints: 2 ints: 2 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x76 (0x7f40b7d73bb6 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: <unknown function> + 0x11c1a78 (0x7f40687b7a78 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #2: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x122 (0x7f40687b8092 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: caffe2::CudnnConvOp::RunOnDevice() + 0x1e0 (0x7f40687a62f0 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: <unknown function> + 0x10e6f75 (0x7f40686dcf75 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x154 (0x7f40b1252934 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #6: <unknown function> + 0x1471745 (0x7f40b125a745 in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x2cb (0x7f40b03087bb in /usr/local/python2.7.15/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #8: <unknown function> + 0xb8c80 (0x7f4077e7fc80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) frame #9: <unknown function> + 0x76ba (0x7f40bce7e6ba in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x6d (0x7f40bcbb441d in /lib/x86_64-linux-gnu/libc.so.6)

I meet this error with caffe2 v1.3.0 and v1.0.1

when I changed https://github.com/facebookresearch/VMZ/blob/f4089e2164f67a98bc5bed4f97dc722bdbcd268e/tools/test_net_large.py#L69

batch_size=args.batch_size to batch_size=args.batch_size * args.crop_per_clip it worked, but the accuracy is very low INFO:test_net_large:Iter 10/19761: clip: 0.239393939394, top1: 0.454545454545, top 5: 0.818181818182 INFO:test_net_large:Iter 20/19761: clip: 0.236507936508, top1: 0.428571428571, top 5: 0.809523809524

nassimaNoufail commented 4 years ago

@huangjun12 can you plz tell me how did you create the lmdb database of Kinetics, we should use the data/create_video_db.py file, that needs 'kinetics_val_list ' CSV file, and I don't know where can I find that CSV file or how to generate it