[Bug] ValueError: Plane vertices are not coplanar

yxchng commented 6 months ago

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux
Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA H100 80GB HBM3
CUDA_HOME: /fs/applications/cuda/12.1.1
NVCC: Cuda compilation tools, release 12.1, V12.1.105
GCC: gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)
PyTorch: 2.2.1+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.17.1+cu121
OpenCV: 4.9.0
MMEngine: 0.10.3
MMDetection: 3.3.0
MMDetection3D: 1.4.0+
spconv2.0: False

Reproduces the problem - code sample

-

Reproduces the problem - command or script

python tools/train.py configs/detection/mv-det3d_8xb4_embodiedscan-3d-284class-9dof.py configs/grounding/mv-grounding_8xb12_embodiedscan-vg-9dof.py

Reproduces the problem - error message

Traceback (most recent call last):
  File "/home/user/EmbodiedScan/./tools/train.py", line 157, in <module>
    main()
  File "/home/user/EmbodiedScan/./tools/train.py", line 153, in main
    runner.train()
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/mmengine/runner/runner.py", line 1777, in train
    model = self.train_loop.run()  # type: ignore
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/mmengine/runner/loops.py", line 96, in run
    self.run_epoch()
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
    self.run_iter(idx, data_batch)
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/mmengine/runner/loops.py", line 128, in run_iter
    outputs = self.runner.model.train_step(
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step
    losses = self._run_forward(data, mode='loss')
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward
    results = self(**data, mode=mode)
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_grounder.py", line 729, in forward
    return self.loss(inputs, data_samples, **kwargs)
  File "/home/user/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_grounder.py", line 572, in loss
    losses = self.bbox_head.loss(**head_inputs_dict,
  File "/home/user/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 643, in loss
    losses = self.loss_by_feat(*loss_inputs)
  File "/home/user/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 674, in loss_by_feat
    losses_cls, losses_bbox = multi_apply(
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/mmdet/models/utils/misc.py", line 219, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/home/user/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 717, in loss_by_feat_single
    cls_reg_targets = self.get_targets(cls_scores_list,
  File "/home/user/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 258, in get_targets
    pos_inds_list, neg_inds_list) = multi_apply(self._get_targets_single,
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/mmdet/models/utils/misc.py", line 219, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/home/user/EmbodiedScan/embodiedscan/models/dense_heads/grounding_head.py", line 398, in _get_targets_single
    assign_result = self.assigner.assign(
  File "/home/user/EmbodiedScan/embodiedscan/models/task_modules/assigners/hungarian_assigner.py", line 113, in assign
    cost = match_cost(pred_instances=pred_instances_3d,
  File "/home/user/EmbodiedScan/embodiedscan/models/losses/match_cost.py", line 108, in __call__
    overlaps = pred_bboxes.overlaps(pred_bboxes, gt_bboxes)
  File "/home/user/EmbodiedScan/embodiedscan/structures/bbox_3d/euler_box3d.py", line 134, in overlaps
    _, iou3d = box3d_overlap(corners1, corners2, eps=eps)
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/pytorch3d/ops/iou_box3d.py", line 160, in box3d_overlap
    if not all((8, 3) == box.shape[1:] for box in [boxes1, boxes2]):
  File "/home/user/cache/conda/envs/embodiedscan/lib/python3.10/site-packages/pytorch3d/ops/iou_box3d.py", line 67, in _check_coplanar
ValueError: Plane vertices are not coplanar

Additional information

I keep running into ValueError: Plane vertices are not coplanar. Is this expected? How to avoid this problem?

Tai-Wang commented 6 months ago

There can be such accidental errors because of the unstable iou3d computation (in pytorch3d) for corner case predictions (which have sides with too small lengths, although we have done some optimization on this problem). You can simply resume training by adding the argument --resume in the script.

In addition, we also encourage you to improve the model and training settings to make the procedure more stable.

mxh1999 commented 6 months ago

This indicates that there is a predicted box with a very short edge length, such as 1e-3 or 1e-4. To address this, you can consider setting a minimum threshold for the model's predicted side length.

OpenRobotLab / EmbodiedScan