IDEA-Research / detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
https://detrex.readthedocs.io/en/latest/
Apache License 2.0
1.99k stars 206 forks source link

Memory leak during DINO training. #322

Open lolikonloli opened 10 months ago

lolikonloli commented 10 months ago

Divice info


sys.platform linux Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] numpy 1.22.4 detectron2 0.6 @/home/lolikonloli/code/detection/package/detrex/detectron2/detectron2 Compiler GCC 11.4 CUDA compiler CUDA 11.8 detectron2 arch flags 7.5 DETECTRON2_ENV_MODULE PyTorch 2.0.1+cu118 @/home/lolikonloli/anaconda3/envs/pl_det/lib/python3.10/site-packages/torch PyTorch debug build False GPU available Yes GPU 0,1 NVIDIA GeForce RTX 2080 Ti (arch=7.5) Driver version 535.104.05 CUDA_HOME /usr/local/cuda-11.8 Pillow 9.3.0 torchvision 0.15.2+cu118 @/home/lolikonloli/anaconda3/envs/pl_det/lib/python3.10/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20221221 iopath 0.1.9 cv2 4.8.0

PyTorch built with:

rentainhe commented 9 months ago

Hello, it's a normal, because of the multi-scale training and denoising query, the model's memory usage is not that stable, it may takes about more than 12GB of 2080Ti, you can try to use fp16 training or lower the total_batch_size to skip this issue, or you can try to add activation checkpoint to reduce the memory usage of the total model