IDEA-Research / MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
Apache License 2.0
1.15k stars 102 forks source link

Memory allocation problem #94

Open YeSho-cpp opened 10 months ago

YeSho-cpp commented 10 months ago

Hello, sorry to bother you, I am running a nuclear data set with maskdino, but my problem now is insufficient memory, my bathsize is changed to 2, numworkers is changed to 0, and I started running, but the efficiency is too slow, numworkers will report memory allocation failure even if it is changed to 1. I have two a6000 graphics cards, but they cannot be distributed and used at the same time, otherwise the memory can not be allocated. I would like to ask you which parameters should be modified to reduce the use of memory.

YeSho-cpp commented 10 months ago

This is my data set information

[10/18 11:09:21] d2.data.datasets.coco INFO: Loading /share/home/ncu10/Code/AI/Point_label/PointWSSIS/cell_data_root/coco/annotations/instances_train2017.json takes 2.70 seconds. [10/18 11:09:21] d2.data.datasets.coco INFO: Loaded 432 images in COCO format from /share/home/ncu10/Code/AI/Point_label/PointWSSIS/cell_data_root/coco/annotations/instances_train2017.json [10/18 11:09:21] d2.data.build INFO: Removed 0 images with no usable annotations. 432 images left. [10/18 11:09:21] d2.data.build INFO: Distribution of instances among all 80 categories:  category #instances category #instances category #instances
person 17073 bicycle 0 car 0
motorcycle 0 airplane 0 bus 0
train 0 truck 0 boat 0
traffic light 0 fire hydrant 0 stop sign 0
parking meter 0 bench 0 bird 0
cat 0 dog 0 horse 0
sheep 0 cow 0 elephant 0
bear 0 zebra 0 giraffe 0
backpack 0 umbrella 0 handbag 0
tie 0 suitcase 0 frisbee 0
skis 0 snowboard 0 sports ball 0
kite 0 baseball bat 0 baseball gl.. 0
skateboard 0 surfboard 0 tennis racket 0
bottle 0 wine glass 0 cup 0
fork 0 knife 0 spoon 0
bowl 0 banana 0 apple 0
sandwich 0 orange 0 broccoli 0
carrot 0 hot dog 0 pizza 0
donut 0 cake 0 chair 0
couch 0 potted plant 0 bed 0
dining table 0 toilet 0 tv 0
laptop 0 mouse 0 remote 0
keyboard 0 cell phone 0 microwave 0
oven 0 toaster 0 sink 0
refrigerator 0 book 0 clock 0
vase 0 scissors 0 teddy bear 0
hair drier 0 toothbrush 0
total 17073 

[10/18 11:09:21] d2.data.build INFO: Using training sampler TrainingSampler [10/18 11:09:21] d2.data.common INFO: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'> [10/18 11:09:21] d2.data.common INFO: Serializing 432 elements to byte tensors and concatenating them all ... [10/18 11:09:22] d2.data.common INFO: Serialized dataset takes 28.01 MiB

YeSho-cpp commented 10 months ago

Using the resnet50 model [10/18 11:09:13] detectron2 INFO: Rank of current process: 0. World size: 1 [10/18 11:09:14] detectron2 INFO: Environment info:


sys.platform linux Python 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0] numpy 1.24.4 detectron2 0.6 @/share/home/ncu10/Code/AI/Point_label/MaskDINO/detectron2/detectron2 Compiler GCC 9.4 CUDA compiler CUDA 11.4 detectron2 arch flags 8.6 DETECTRON2_ENV_MODULE PyTorch 1.13.1 @/share/home/ncu10/miniconda3/envs/py38/lib/python3.8/site-packages/torch PyTorch debug build False torch._C._GLIBCXX_USE_CXX11_ABI False GPU available Yes GPU 0 NVIDIA RTX A6000 (arch=8.6) Driver version 470.86 CUDA_HOME /share/home/ncu10/CUDA/CUDA11.4 Pillow 9.5.0 torchvision 0.14.1 @/share/home/ncu10/miniconda3/envs/py38/lib/python3.8/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20221221 iopath 0.1.9 cv2 4.8.0

CUDA_VISIBLE_DEVICES=1 python train_net.py --num-gpus 1 --config-file /share/home/ncu10/Code/AI/Point_label/MaskDINO/configs/coco/instance-segmentation/maskdino_R50_bs16_50ep_3s.yaml MODEL.WEIGHTS /share/home/ncu10/Code/AI/Point_label/MaskDINO/model_file/maskdino_r50_50ep_300q_hid1024_3sd1_instance_maskenhanced_mask46.1ap_box51.5ap.pth

sym330 commented 9 months ago

same error

FengLi-ust commented 2 months ago

Sorry for the late reply. How much memory do you need in our case? We use about 30G for Resnet50 batch size 4.