OpenGVLab / UniFormerV2

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
https://arxiv.org/abs/2211.09552
Apache License 2.0
291 stars 18 forks source link

Runtime error during test mode on K400 #18

Closed mkjubran closed 1 year ago

mkjubran commented 1 year ago

I get the run time error below when running the UniFormerV2. The error occurs during the execution of the code "countdict, * = model_stats_fun(model, inputs)". Any advice/hints?

setting in config.yaml: PRETRAIN: 'try/k400_k710_uniformerv2_l14_32x336.pyth'

Traceback (most recent call last): File "tools/run_net_multi_node.py", line 69, in main() File "tools/run_net_multi_node.py", line 54, in main launch_job(cfg=cfg, init_method=args.init_method, func=test) File "/CVDLPT/UniFormerV2/slowfast/utils/misc.py", line 313, in launch_job func(cfg=cfg) File "/CVDLPT/UniFormerV2/tools/test_net.py", line 169, in test misc.log_model_info(model, cfg, use_train_input=False) File "/CVDLPT/UniFormerV2/slowfast/utils/misc.py", line 190, in log_model_info get_model_stats(model, cfg, "flop", use_train_input) File "/CVDLPT/UniFormerV2/slowfast/utils/misc.py", line 168, in get_model_stats countdict, * = model_stats_fun(model, inputs) File "/usr/local/lib/python3.8/dist-packages/fvcore/nn/flop_count.py", line 147, in flop_count for op, flop in flop_counter.by_operator().items(): File "/usr/local/lib/python3.8/dist-packages/fvcore/nn/jit_analysis.py", line 265, in by_operator stats = self._analyze() File "/usr/local/lib/python3.8/dist-packages/fvcore/nn/jit_analysis.py", line 551, in _analyze graph = _get_scoped_trace_graph(self._model, self._inputs, self._aliases) File "/usr/local/lib/python3.8/dist-packages/fvcore/nn/jit_analysis.py", line 176, in _get_scoped_tracegraph graph, = _get_trace_graph(module, inputs) File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 1184, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 118, in wrapper outs.append(self.inner(trace_inputs)) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1212, in _call_impl result = forward_call(input, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1182, in _slow_forward result = self.forward(*input, kwargs) File "/CVDLPT/UniFormerV2/slowfast/models/uniformerv2.py", line 90, in forward output = self.backbone(x) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1212, in _call_impl result = forward_call(*input, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1182, in _slow_forward result = self.forward(*input, *kwargs) File "/CVDLPT/UniFormerV2/slowfast/models/uniformerv2_model.py", line 337, in forward out = self.transformer(x) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1212, in _call_impl result = forward_call(input, kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1182, in _slow_forward result = self.forward(*input, kwargs) File "/CVDLPT/UniFormerV2/slowfast/models/uniformerv2_model.py", line 261, in forward x = resblock(x, self.T, use_checkpoint=True) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1212, in _call_impl result = forward_call(*input, *kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1182, in _slow_forward result = self.forward(input, kwargs) File "/CVDLPT/UniFormerV2/slowfast/models/uniformerv2_model.py", line 119, in forward attn_out = checkpoint.checkpoint(self.attention, self.ln_1(x)) File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, *args) RuntimeError: _Map_base::at

Andy1621 commented 1 year ago

You can try to comment out the code for printing model statistics. https://github.com/OpenGVLab/UniFormerV2/blob/d390105e588665af5029bfcceed5b9975d4b13bb/tools/test_net.py#L165-L166

I see that you try to run the largest model, and sometimes the above code may cause a problem with memory overflow. To check whether it is caused by memory overflow, you can first try to test a smaller model, like UniFormerV2-B/16. BTW, when testing K400, note that our label map may be different from others, you may need to relabel the meta file.

mkjubran commented 1 year ago

Thanks. I changed the model to UniFormerV2-B/16 and the previous error did not appear. However, I got the following error. Any help?

Screen Shot
Andy1621 commented 1 year ago

Are you running the code on a single GPU? This is a common bug for SlowFast. You can try to run it on multi GPU.

ahmadjaved97 commented 5 months ago

@Andy1621 is there a way to run UniFormerV2-B/16 on a single GPU? Or does it need multiple GPUs to run?