ModelTC / United-Perception

United Perception
Apache License 2.0
428 stars 65 forks source link

修复单卡下yolox训练发生错误 #38

Closed RedHandLM closed 2 years ago

RedHandLM commented 2 years ago

基于单卡模式训练yolox时发生异常,原因是在allreduce没有判断world_size image

增加world_size判断之后修复问题

RedHandLM commented 2 years ago

Traceback (most recent call last): File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data/workspace/mqbench_main/United-Perception/up/main.py", line 27, in main() File "/data/workspace/mqbench_main/United-Perception/up/main.py", line 21, in main args.run(args) File "/data/workspace/mqbench_main/United-Perception/up/commands/quant_deploy.py", line 61, in _main main(args) File "/data/workspace/mqbench_main/United-Perception/up/commands/quant_deploy.py", line 55, in main quant_deploy.deploy() File "/data/workspace/mqbench_main/United-Perception/up/tasks/quant/deploy/quant_deploy.py", line 53, in deploy convert_deploy(model=mod, File "/data/workspace/mqbench_main/MQBench/mqbench/convert_deploy.py", line 184, in convert_deploy convert_function(deploy_model, kwargs) File "/data/workspace/mqbench_main/MQBench/mqbench/convert_deploy.py", line 73, in convert_onnx torch.onnx.export(model, dummy_input, onnx_model_path, File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/onnx/init.py", line 316, in export return utils.export(model, args, f, export_params, verbose, training, File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/onnx/utils.py", line 107, in export _export(model, args, f, export_params, verbose, training, input_names, output_names, File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/onnx/utils.py", line 724, in _export _model_to_graph(model, args, verbose, input_names, File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/onnx/utils.py", line 388, in _trace_and_get_graph_from_model torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True) File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, *kwargs) File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/jit/_trace.py", line 118, in wrapper outs.append(self.inner(*trace_inputs)) File "/home/lsc/anaconda3/envs/mqbench/lib/python3.8/site-packages/torch/fx/graph_module.py", line 616, in wrapped_call raise e.with_traceback(None) KeyError: 'deploy_output_node'

另一个提交修复这个KeyError