2024-07-10 22:44:07,575 - mmdet - INFO - Saving checkpoint at 1 epochs
[ ] 0/815, elapsed: 0s, ETA:Traceback (most recent call last):
File "./tools/train.py", line 261, in <module>
main()
File "./tools/train.py", line 250, in main
custom_train_model(
File "/home/hitbuyi/AD_Projects/Pytorch_Project/VoxFormer/projects/mmdet3d_plugin/voxformer/apis/train.py", line 27, in custom_train_model
custom_train_detector(
File "/home/hitbuyi/AD_Projects/Pytorch_Project/VoxFormer/projects/mmdet3d_plugin/voxformer/apis/mmdet_train.py", line 200, in custom_train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
self.call_hook('after_train_epoch')
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py", line 267, in after_train_epoch
self._do_evaluate(runner)
File "/home/hitbuyi/AD_Projects/Pytorch_Project/VoxFormer/projects/mmdet3d_plugin/core/evaluation/eval_hooks.py", line 77, in _do_evaluate
results = custom_multi_gpu_test(
File "/home/hitbuyi/AD_Projects/Pytorch_Project/VoxFormer/projects/mmdet3d_plugin/voxformer/apis/test.py", line 81, in custom_multi_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hitbuyi/AD_Projects/Pytorch_Project/VoxFormer/projects/mmdet3d_plugin/voxformer/detectors/lmscnet.py", line 249, in forward
return self.foward_test(**kwargs)
File "/home/hitbuyi/AD_Projects/Pytorch_Project/VoxFormer/projects/mmdet3d_plugin/voxformer/detectors/lmscnet.py", line 317, in foward_test
y_pred_bin.tofile(save_query_path)
NameError: name 'save_query_path' is not defined
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 426443) of binary: /home/hitbuyi/.conda/envs/pt110/bin/python
Traceback (most recent call last):
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/hitbuyi/.conda/envs/pt110/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
./tools/train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-07-10_22:44:14
host : hitbuyi-Dell-G15-5511
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 426443)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================