Not enough memory on RTX 3090 to train ViTDet?

fschvart commented 2 years ago

If you do not know the root cause of the problem, please post according to this template:

Instructions To Reproduce the Issue:

I'm trying to train an instance segmentation ViTDet model with a custom and relatively small dataset (6000 images of 640x480) I'm using Windows 10 and RTX 3090. I'm trying to train using the basic configuration ViTDet, ViT-B), which in theory, should take 12.3gb. My RTX 3090 has 24gb and I get a CUDA out of memory message. I reduced the batch size from 64 to 2, reduced the number of workers to 2 and used FP16 comperssion, none of it solved the issue. I checked the config.yaml in output and I see that these are my actual settings.

here's my environment:

sys.platform win32 Python 3.10.5	packaged by conda-forge	(main, Jun 14 2022, 06:57:19) [MSC v.1929 64 bit (AMD64)] numpy 1.23.1 detectron2 0.6 @C:\detectron2\detectron2 Compiler MSVC 193231332 CUDA compiler CUDA 11.6 detectron2 arch flags C:\detectron2\detectron2_C.cp310-win_amd64.pyd; cannot find cuobjdump DETECTRON2_ENV_MODULE PyTorch 1.12.0 @C:\cuda\miniconda\envs\mmopenlab\lib\site-packages\torch PyTorch debug build False GPU available Yes GPU 0,1 NVIDIA GeForce RTX 3090 (arch=8.6) Driver version 516.59 CUDA_HOME C:\cuda\116 Pillow 9.2.0 torchvision 0.13.0 @C:\cuda\miniconda\envs\mmopenlab\lib\site-packages\torchvision torchvision arch flags C:\cuda\miniconda\envs\mmopenlab\lib\site-packages\torchvision_C.pyd; cannot find cuobjdump fvcore 0.1.5.post20220512 iopath 0.1.9 cv2 4.6.0

PyTorch built with:

C++ Version: 199711
MSVC 192829337
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 2019
LAPACK is enabled (usually provided by MKL)
CPU capability usage: AVX2
CUDA Runtime 11.6
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.5.4
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/cb/pytorch_1000000000000/work/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,

wudizuixiaosa commented 2 years ago

Hi，I meet same question。Have you found the reason and solve it，by the way，How did you change the batchsize? I didn't see the relevant settings

fschvart commented 2 years ago

Hi，I meet same question。Have you found the reason and solve it，by the way，How did you change the batchsize? I didn't see the relevant settings

I left it for now. I don't know how to solve it. I found the batch size setting in the config file, or one of the parent config files and changed it there

wudizuixiaosa commented 2 years ago

Thank you for your timely reply. I plan to use the mask RCNN. The report needs 10.3G, but my 3060 and 12g are not enough. I also constructed a 6K data set myself. I was wondering whether some of his super parameters were designed for the coco data set. We couldn't make it clear, but the official didn't provide some instructions on how to use them. I'm sorry, but I looked carefully again, but I still couldn't find the batchsize. If it's convenient for you, I still hope you can tell me which file it is.

RuiMingGao commented 2 years ago

Hi，I meet same question。Have you found the reason and solve it，by the way，How did you change the batchsize? I didn't see the relevant settings

I left it for now. I don't know how to solve it. I found the batch size setting in the config file, or one of the parent config files and changed it there

just change value in config.default.py cannot solve it. because all code are install in system before. so if you want to change default value, you should use pip to reinstall detectron2. it is so bad for value as batch_size to change.

kretes commented 2 years ago

the way you should configure your batch size is for your own experiment, not globally. To do it in a single run just add the last argument as here: ../../tools/lazyconfig_train_net.py --config-file configs/COCO/mask_rcnn_vitdet_b_100ep.py "dataloader.train.total_batch_size=1"

wudizuixiaosa commented 2 years ago

the way you should configure your batch size is for your own experiment, not globally. To do it in a single run just add the last argument as here:

thank you for your reply

wudizuixiaosa commented 2 years ago

Hi，I meet same question。Have you found the reason and solve it，by the way，How did you change the batchsize? I didn't see the relevant settings

I left it for now. I don't know how to solve it. I found the batch size setting in the config file, or one of the parent config files and changed it there

just change value in config.default.py cannot solve it. because all code are install in system before. so if you want to change default value, you should use pip to reinstall detectron2. it is so bad for value as batch_size to change.

In fact, I don't want to change the batch_ Size, but I always feel that the memory occupation in the table in the paper is not the same as that I understand, especially when it is 64, it occupies 10.9g, while when I am 2, the 12g 3060 is still insufficient

VGrondin commented 2 years ago

As kretes said, a batch size of 1 should fit in your GPU. From the information in their paper, the batch size they use for COCO finetuning is 64, distributed across 64 GPUs (1 image per GPU), and they use A100! I tried with a batch size of 2 on a 3090, but it takes 20GB so its borderline.

Leiyi-Hu commented 11 months ago

Hi, I have a similar problem, but I found that the ViTDet occupies more GPU memory than the plain ViT without the window partition. I am confused.

facebookresearch / detectron2

Not enough memory on RTX 3090 to train ViTDet? #4496

Instructions To Reproduce the Issue: