MCG-NJU / SparseOcc

[ECCV 2024] Fully Sparse 3D Occupancy Prediction & RayIoU Evaluation Metric
https://arxiv.org/abs/2312.17118
Apache License 2.0
277 stars 22 forks source link

The size of "tgt_mask" become (0, 32000) during the training #10

Closed ZM-Zhou closed 7 months ago

ZM-Zhou commented 7 months ago

Hello, thanks for releasing this excellent work! (of course I gave the star~) When I tried to train the model, I got the following error:

File "/github/SparseOcc/models/sparseocc.py", line 127, in forward
    return self.forward_train(**kwargs)
  File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func
    return old_func(*args, **kwargs)
  File "/github/SparseOcc/models/sparseocc.py", line 134, in forward_train
    return self.forward_pts_train(img_feats, voxel_semantics, voxel_instances, instance_class_ids, mask_camera, img_metas)
  File "/github/SparseOcc/models/sparseocc.py", line 123, in forward_pts_train
    return self.pts_bbox_head.loss(*loss_inputs)
  File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func
    return old_func(*args, **kwargs)
  File "/github/SparseOcc/models/sparseocc_head.py", line 55, in loss
    return self.loss_single(voxel_semantics, voxel_instances, instance_class_ids, preds_dicts, mask_camera)
  File "/github/SparseOcc/models/sparseocc_head.py", line 103, in loss_single
    indices = self.matcher(pred, preds_dicts['class_preds'][i], voxel_instances, instance_class_ids, mask_camera)
  File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/github/SparseOcc/models/matcher.py", line 146, in forward
    tgt_mask = tgt_mask.view(tgt_mask.shape[0], -1)
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

It seems like that the prediction does not matching any valid label, thus the size of tgt_mask is (0, 32000). BTW, Could you release the training log of the model, it will be helpful for further checking the model.

afterthat97 commented 7 months ago

which config?

ZM-Zhou commented 7 months ago

I used r50_nuimg_704x256_8f.py

afterthat97 commented 7 months ago

did u modify the setting? it's OK on my server.

99er-gao commented 7 months ago

Try to modify line210 in loaders/pipelines/loading.py . results['voxel_semantics'] = semantics.astype(np.int) results['voxel_instances'] = final_instances.astype(np.int)

afterthat97 commented 7 months ago

sorry I cannot reproduce the error, could you try the latest codebase?

ZM-Zhou commented 7 months ago

Try to modify line210 in loaders/pipelines/loading.py . results['voxel_semantics'] = semantics.astype(np.int) results['voxel_instances'] = final_instances.astype(np.int)

It works!! thank you so much. BTW, in the latest codebase, they are moved to line 219-220