Closed mkjubran closed 1 year ago
You can try to comment out the code for printing model statistics. https://github.com/OpenGVLab/UniFormerV2/blob/d390105e588665af5029bfcceed5b9975d4b13bb/tools/test_net.py#L165-L166
I see that you try to run the largest model, and sometimes the above code may cause a problem with memory overflow. To check whether it is caused by memory overflow, you can first try to test a smaller model, like UniFormerV2-B/16. BTW, when testing K400, note that our label map may be different from others, you may need to relabel the meta file.
Thanks. I changed the model to UniFormerV2-B/16 and the previous error did not appear. However, I got the following error. Any help?
Are you running the code on a single GPU? This is a common bug for SlowFast. You can try to run it on multi GPU.
@Andy1621 is there a way to run UniFormerV2-B/16 on a single GPU? Or does it need multiple GPUs to run?
I get the run time error below when running the UniFormerV2. The error occurs during the execution of the code "countdict, * = model_stats_fun(model, inputs)". Any advice/hints?
setting in config.yaml: PRETRAIN: 'try/k400_k710_uniformerv2_l14_32x336.pyth'
Traceback (most recent call last): File "tools/run_net_multi_node.py", line 69, in
main()
File "tools/run_net_multi_node.py", line 54, in main
launch_job(cfg=cfg, init_method=args.init_method, func=test)
File "/CVDLPT/UniFormerV2/slowfast/utils/misc.py", line 313, in launch_job
func(cfg=cfg)
File "/CVDLPT/UniFormerV2/tools/test_net.py", line 169, in test
misc.log_model_info(model, cfg, use_train_input=False)
File "/CVDLPT/UniFormerV2/slowfast/utils/misc.py", line 190, in log_model_info
get_model_stats(model, cfg, "flop", use_train_input)
File "/CVDLPT/UniFormerV2/slowfast/utils/misc.py", line 168, in get_model_stats
countdict, * = model_stats_fun(model, inputs)
File "/usr/local/lib/python3.8/dist-packages/fvcore/nn/flop_count.py", line 147, in flop_count
for op, flop in flop_counter.by_operator().items():
File "/usr/local/lib/python3.8/dist-packages/fvcore/nn/jit_analysis.py", line 265, in by_operator
stats = self._analyze()
File "/usr/local/lib/python3.8/dist-packages/fvcore/nn/jit_analysis.py", line 551, in _analyze
graph = _get_scoped_trace_graph(self._model, self._inputs, self._aliases)
File "/usr/local/lib/python3.8/dist-packages/fvcore/nn/jit_analysis.py", line 176, in _get_scoped_tracegraph
graph, = _get_trace_graph(module, inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 1184, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 127, in forward
graph, out = torch._C._create_graph_by_tracing(
File "/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py", line 118, in wrapper
outs.append(self.inner(trace_inputs))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1212, in _call_impl
result = forward_call(input, kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
result = self.forward(*input, kwargs)
File "/CVDLPT/UniFormerV2/slowfast/models/uniformerv2.py", line 90, in forward
output = self.backbone(x)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1212, in _call_impl
result = forward_call(*input, kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
result = self.forward(*input, *kwargs)
File "/CVDLPT/UniFormerV2/slowfast/models/uniformerv2_model.py", line 337, in forward
out = self.transformer(x)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1212, in _call_impl
result = forward_call(input, kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
result = self.forward(*input, kwargs)
File "/CVDLPT/UniFormerV2/slowfast/models/uniformerv2_model.py", line 261, in forward
x = resblock(x, self.T, use_checkpoint=True)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1212, in _call_impl
result = forward_call(*input, *kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
result = self.forward(input, kwargs)
File "/CVDLPT/UniFormerV2/slowfast/models/uniformerv2_model.py", line 119, in forward
attn_out = checkpoint.checkpoint(self.attention, self.ln_1(x))
File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
RuntimeError: _Map_base::at