out of memory - Githubissues

xziyh commented 2 years ago

hello author, i use the 3080ti to train Conditional DETR with entire coco2017 datasets. But the programs report that cuda out of memory,3080ti has 12GB memory.I use the msi after burner to monitor the memory usage，and it shows the biggest memory usage is only 2520MB I set the batchsize to 1.

DeppMeng commented 2 years ago

Hi,

Can you give us the exact training script, including all the training arguments? So that we can check whether the memory is not enough indeed or there are some other reasons.

xziyh commented 2 years ago

Hi, Thanks for ur reply,there are some arguments： lr=0.0001, lr_backbone=1e-05, batch_size=1, weight_decay=0.0001, epochs=1, lr_drop=40, clip_max_norm=0.1, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', enc_layers=6, dec_layers=6, dim_feedforward=2048, hidden_dim=256, dropout=0.1, nheads=8, num_queries=300, pre_norm=False, masks=False, aux_loss=True, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, cls_loss_coef=2, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, dataset_file='coco', coco_path='coco', coco_panoptic_path=None, remove_difficult=False, output_dir='results', device='cuda', seed=42, resume='', start_epoch=0, eval=False, num_workers=2, world_size=1, dist_url='env://', distributed=False)

DeppMeng commented 2 years ago

It is strange. From your arguments, you use resnet50 backbone without dilation. In this setting, 12GB memory should be well enough for batch_size 1. I do not have a clue. Maybe try to restart your computer to make sure all background programs that might comsume GPU memory are killed.

Atten4Vis / ConditionalDETR

out of memory #10