Closed mrsempress closed 5 months ago
But when I increase the value of eps, the error "Planes have zero areas" will be reported, that is, _check_coplanar()
and _check_nonzero()
will conflict.
Yes, that is the case. The EPS for your two mentioned check operations is in conflict. You may need to tune the optimizer settings to make the training more stable. I notice you only use two GPUs and you may need to reduce the learning rate by 2 or 4 accordingly.
Please see #22 for more explanations about this bug.
Thanks, I will try it again.
Thanks, I will try it again.
Hi, @mrsempress , did you solved this issues?
@EricLee0224 No, I don't solve this issue. I only keep the original value and retrain from the beginning. Sometimes, it can train completely.
Hi all, we have just sorted out the occupancy prediction baseline recently. While open-sourcing those parts, we will have a closer look at this problem, particularly for the visual grounding baseline. We will try to address it in two weeks.
Hi, I solved this issue in https://github.com/OpenRobotLab/EmbodiedScan/issues/40#issuecomment-2058598322 please have a check I am now using this strategy to train to the model, and it runs successfully for 100 iterations.
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
System environment: [1085/1460] sys.platform: linux Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 1551893665 GPU 0,1: NVIDIA A100-SXM4-80GB CUDA_HOME: /mnt/lustre/share/cuda-11.0 NVCC: Cuda compilation tools, release 11.0, V11.0.221 GCC: gcc (GCC) 5.4.0 PyTorch: 1.12.1 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSEKINETO -DUSE$ BGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unuse$ -parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostic$ -color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.$ , USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.13.1 OpenCV: 4.9.0 MMEngine: 0.10.3
Reproduces the problem - code sample
In embodiedscan/structures/bbox_3d/euler_box3d.py#L134
Reproduces the problem - command or script
Reproduces the problem - error message
Additional information
In https://github.com/facebookresearch/pytorch3d/issues/992, they suggest increasing EPS. Will this problem occur under your default setting of 1e-4? If so, how do I adjust the EPS value? And this happened in my 5th epoch, with randomness, what is the reason for this?