Closed mrsempress closed 3 months ago
You need to reduce the learning rate by 2 or 4 accordingly because the actual batch size is only 1/4 of that in our experiments. It should yield a comparable result when you adjust the optimizer setting, although we did not try them before.
When I reproduced it, it was still based on GPU 8, as written in your mv_grounding.sh, but the result was not good. When the GPU changes, an error message will appear, and it will not run successfully.
I removed the pretrained checkpoint from the config because I didn't know that pretrained weights were necessary and didn't see the detection branch's role on the visual grounding branch in the pipeline. I will try again to get the pre-training weights and redo the visual grounding task. Thank you for your reply.
OK. We found loading the pretrained detection checkpoint to be a helpful trick, as it is mentioned in BUTD-DETR. Look forward to your further feedback.
I removed the pretrained checkpoint from the config because I didn't know that pretrained weights were necessary and didn't see the detection branch's role on the visual grounding branch in the pipeline. I will try again to get the pre-training weights and redo the visual grounding task. Thank you for your reply.
Since the feature extraction pipeline can be shared by detection and visual grounding task, so we can use the 3D detection pre-trained checkpoint for weight initialization. It can be useful for better grounding performance and accelerate the training convergence at some extent.
Close due to inactivity. Please feel free to reopen this issue if you have any further questions.
After loading your checkpoint, the performance exceeded what your paper reported(+7.95%).
The results in the paper are:
AP25:
| Type | Easy | Hard | View-Dep | View-Indep | Overall |
| results | 0.2711 | 0.2012 | 0.2342 | 0.2637 | 0.2572 |
The reproducibility results are: (with load your checkpoint)
AP25:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.3489 | 0.3018|0.3567|0.3277|0.0000|0.3377|0.3377|
AP50:
| Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall |
| results | 0.1168|0.0925|0.1127|0.1159|0.0000|0.1148|0.1148|
overall
is the same as the result of multi
.After loading your checkpoint, the performance exceeded what your paper reported(+7.95%).
The results in the paper are: AP25: | Type | Easy | Hard | View-Dep | View-Indep | Overall | | results | 0.2711 | 0.2012 | 0.2342 | 0.2637 | 0.2572 | The reproducibility results are: (with load your checkpoint) AP25: | Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall | | results | 0.3489 | 0.3018|0.3567|0.3277|0.0000|0.3377|0.3377| AP50: | Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall | | results | 0.1168|0.0925|0.1127|0.1159|0.0000|0.1148|0.1148|
- Another question is why the result of
overall
is the same as the result ofmulti
.
- In addition, you mentioned that using detection checkpoint is important. In my experiment, it increased by 13.04%. If the grounding checkpoint is also used as the initialization of detection, will there be an improvement? If we keep looping initialization, can we get better results?
multiple
type, the overall
performance is exactly the same as multiple
.
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
System environment: [1085/1460] sys.platform: linux Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 1551893665 GPU 0,1: NVIDIA A100-SXM4-80GB CUDA_HOME: /mnt/lustre/share/cuda-11.0 NVCC: Cuda compilation tools, release 11.0, V11.0.221 GCC: gcc (GCC) 5.4.0 PyTorch: 1.12.1 PyTorch compiling details: PyTorch built with:
GCC 9.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code= sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_BGEMM−DUSEQNNPACK−DUSEPYTORCHQNNPACK−DUSEXNNPACK−DSYMBOLICATEMOBILEDEBUGHANDLE−DEDGEPROFILERUSEKINETO−O2−fPIC−Wno−narrowing−Wall−Wextra−Werror=return−type−Wno−missing−field−initializers−Wno−type−limits−Wno−array−bounds−Wno−unknown−pragmas−Wno−unuse -parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostic−color=always−faligned−new−Wno−unused−but−set−variable−Wno−maybe−uninitialized−fno−math−errno−fno−trapping−math−Werror=format−Werror=cast−function−type−Wno−stringop−overflow,LAPACKINFO=mkl,PERFWITHAVX=1,PERFWITHAVX2=1,PERFWITHAVX512=1,TORCHVERSION=1.12. , USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.13.1 OpenCV: 4.9.0 MMEngine: 0.10.3
Reproduces the problem - code sample
N/A
Reproduces the problem - command or script
sh tools/mv-grounding.sh
Reproduces the problem - error message
The reproducibility results are: AP25: | Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall | | results | 0.2093 | 0.1840 | 0.1966 | 0.2129 | 0.0000 | 0.2073 | 0.2073 |
AP50: | Type | Easy | Hard | View-Dep | View-Indep | Unique | Multi | Overall | | results | 0.0535 | 0.0452 | 0.0581 | 0.0501 | 0.0000 | 0.0528 | 0.0528 |
But, the results in the paper are: AP25: | Type | Easy | Hard | View-Dep | View-Indep | Overall | | results | 0.2711 | 0.2012 | 0.2342 | 0.2637 | 0.2572 |
In addition, the training can only be completed when the number of GPUs is 8. When the number of GPUs is 2 or 4, issue 30 will sometimes occur, and issue 26 will sometimes occur.
Additional information