Open jacksonsc007 opened 1 year ago
NOW!
Terrific! Thanks a lot. Could you specify the version of mmdetection btw?
And the pytorch version, it seems like you have pytorch>=2.0.0
System environment: sys.platform: linux Python: 3.8.16 (default, Jun 12 2023, 18:09:05) [GCC 11.2.0] CUDA available: True numpy_random_seed: 1123624972 GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 PyTorch: 2.0.1+cu117 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.15.2+cu117 OpenCV: 4.8.0 MMEngine: 0.8.0
And the pytorch version, it seems like you have pytorch>=2.0.0
yes, I try to catch up with the rapid world.
Hi, while I went through your code, I encountered some issues, hope you could help me out :)
1. what does "self.grad_accumulation" mean [here](https://github.com/MCG-NJU/DEQDet/blob/fa72a62b2340a04300424041e9ebd0087a700eba/projects/deqdet/deq_det_roi_head.py#L219C12-L219C35)? And the meaning of "stash gradient"? 2. for the refine-aware gradient formulation you proposed in equation (11), it seems that you didn't not use this technique in your code implementation to speed up training and save memory, but used naive iteration and **autograd of pytorch** to deal with backward grad propagation instead. Am I right?
The refinement aware gradient is equivalent to the truncated bptt to some extend, cutting off the higher order terms of the rnn iterations. Due to that then each supervision is independent, we can use gradient accumulation between each supervision to avoid the extra memory consumption, but the Autograd in pytorch will push the gradient calculated in single supervision to every parameters, resulting serval back pass to backbone though, so I use this hook to stash gradient to mlvl features, the last backward of the supervision will restore the stashed gradient, and bring stashed gradient to backbone weights
Hi, while I went through your code, I encountered some issues, hope you could help me out :)
1. what does "self.grad_accumulation" mean [here](https://github.com/MCG-NJU/DEQDet/blob/fa72a62b2340a04300424041e9ebd0087a700eba/projects/deqdet/deq_det_roi_head.py#L219C12-L219C35)? And the meaning of "stash gradient"? 2. for the refine-aware gradient formulation you proposed in equation (11), it seems that you didn't not use this technique in your code implementation to speed up training and save memory, but used naive iteration and **autograd of pytorch** to deal with backward grad propagation instead. Am I right?
For the question 2, yes, the RAG formulation is derived from 2-step unrolled fix-point formulation in paper, the implementation in codebase is that 2-step unrolled fix-point. The equation mainly helps to analyze the reason why two-step better than simple estimation method used in deq-flow. You can find the pesudo code in appendix.
Thanks for the excellent work. I wonder the release time of the source code. Looking forward to your reply. : )