为什么我的显存开销非常大，这正常吗？

Sense-X / Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training

MIT License

1.01k stars 111 forks source link

为什么我的显存开销非常大，这正常吗？ #98

Open tyloocifer opened 11 months ago

tyloocifer commented 11 months ago

当我用3090 设置图片尺寸resize为1920*1080，batchsize=1时，显存会直接爆掉，请问我该如何解决这个问题，是哪一步导致了这么大的开销？

tyloocifer commented 11 months ago

i used co_dino_5scale_lsj_r50_1x_coco.py in the MMdetection project

tyloocifer commented 11 months ago

When i change the model to dino it works. but co_dino doesnt. i try to reduce the num_co_head and only use fasterRcnn or Atss it still require a lot of memory.

tyloocifer commented 11 months ago

When i change the model to dino it works. but co_dino doesnt. i try to reduce the num_co_head and only use fasterRcnn or Atss it still require a lot of memory.

TempleX98 commented 11 months ago

LSJ aug requires more memory than DETR aug. If you adopt a resolution of 1920x1080, it's better to use the config co_dino_5scale_r50_1x_coco.py. Besides, you can enable checkpointing by adding with_cp=True to backbone config and change the 'with_cp' in encoder config from 4 to 6:

backbone=dict(
    type='ResNet',
    depth=50,
    num_stages=4,
    out_indices=(0, 1, 2, 3),
    frozen_stages=1,
    norm_cfg=dict(type='BN', requires_grad=False),
    norm_eval=True,
    style='pytorch',
    with_cp=True,
    init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),

tyloocifer commented 11 months ago

it still doesnt work. I just adopt your network. As for dataloader and other config i didnt use it. when I use Dino it just allocate 10G when batchsize=1, but co_dino_r50_1x cant run. it shows CUDA out of memory

TempleX98 commented 11 months ago

Do you use DINO-4scale?

tyloocifer commented 11 months ago

yep

tyloocifer commented 11 months ago

perhaps i need to change it into 4scale?

TempleX98 commented 11 months ago

Yes, the 5-scale model consumes much more memory than 4-scale

Feobi1999 commented 11 months ago

I use projects/configs/co_dino/co_dino_5scale_swin_large_16e_o365tococo.py, and it seems if I freeze the backbone and set the checkpoint to False, it will OOM in a 24G A30

TempleX98 commented 11 months ago

I use projects/configs/co_dino/co_dino_5scale_swin_large_16e_o365tococo.py, and it seems if I freeze the backbone and set the checkpoint to False, it will OOM in a 24G A30

Co-DETR with frozen SwinL and image size 1333x800 requires more than 15GB memory. The config you use enlarges the resolution by 1.5x and 24GB memory may be insufficient. AMP and FSDP can help you to reduce the training memory.

tyloocifer commented 11 months ago

if i wanna get a 4-scale model, where should i change except config file.

tyloocifer commented 11 months ago

The total loss has been oscillating around 20, is this normal?