Open M0L4N opened 2 weeks ago
Please change the "load_from = 'ckpt/r101_dcn_fcos3d_pretrain.pth'" into "load_from=ckpt/resnet50-0676ba61.pth" in the config file. You need to manually download resnet50-0676ba61.pth and put it into the ckpt folder.
Thank you for your reply, but the error still occurred after I changed it. Is there any other information I can provide?
Did you encounter this error at the very beginning of the training?
its always occur during training
Looks like a numerical stability problem. You can try using mmdet3d's focal loss or turn off the AMP setting in the train.py
Thanks for your advice. I'll try it.
rank0: Traceback (most recent call last): rank0: File "tools/train.py", line 135, in
rank0: File "tools/train.py", line 131, in main
rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train rank0: model = self.train_loop.run() # type: ignore rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/mmengine/runner/loops.py", line 98, in run
rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/mmengine/runner/loops.py", line 115, in run_epoch rank0: self.run_iter(idx, data_batch) rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/mmengine/runner/loops.py", line 131, in run_iter rank0: outputs = self.runner.model.train_step( rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step rank0: losses = self._run_forward(data, mode='loss') rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward rank0: results = self(data, mode=mode) rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward rank0: else self._run_ddp_forward(*inputs, kwargs) rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward rank0: return self.module(*inputs, *kwargs) # type: ignoreindex: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl rank0: return self._call_impl(args, kwargs) rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl rank0: return forward_call(*args, kwargs) rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/mmdet3d/models/segmentors/base.py", line 102, in forward rank0: return self.loss(inputs, data_samples) rank0: File "/home/zxt/OCCFusion/occfusion/main.py", line 144, in loss rank0: loss = dict(level0_loss = torch.nan_to_num(self.loss_fl(vox_fl_predict_lvl0,vox_fl_label_lvl0)) + \ rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/home/zxt/anaconda3/envs/OCCFusion/lib/python3.8/site-packages/focal_loss/focal_loss.py", line 77, in forward rank0: assert torch.all((x >= 0.0) & (x <= 1.0)), ValueError( rank0: AssertionError: The predictions values should be between 0 and 1, make sure to pass the values to sigmoid for binary classification or softmax for multi-class classification
I just changed backbone to resnet50 could anyone help me plz