SHI-Labs / OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023
https://praeclarumjj3.github.io/oneformer
MIT License
1.41k stars 128 forks source link

GPU Memory requirements #20

Closed tomhog closed 1 year ago

tomhog commented 1 year ago

Hi

Firstly thank you for releasing this amazing work. Not only is the model amazing but the code quality is excellent. Very easy to follow.

I have a question regarding GPU memory requirements for training. In the readme there's a bit of conflicting information.

We train all our models using 8 A6000 (48 GB each) GPUs. We use 8 A100 (80 GB each) for training Swin-L† OneFormer and DiNAT-L† OneFormer on COCO and all

Is it 8xA6000 (384GB) or 8xA100 (640GB)? Additionally would it be possible to achieve good results with less, say 2xA6000 (96GB), with it just taking longer?

Many Thanks Tom

praeclarumjj3 commented 1 year ago

Hi, thanks for pointing out the plausibility of confusion. We will make it more clear in a future commit.

Additionally would it be possible to achieve good results with less, say 2xA6000 (96GB), with it just taking longer?

I think so if the memory is enough. You might need to tune some hyperparameters though as we provide values for training with 8 GPUs

tomhog commented 1 year ago

Thanks for the response,

For anyone interested I've been able to get Convnext Large ADE20K to train on two A6000 gpus by lowering the crop size to 512x512 while still keeping the batch size at 16. I used the following config

_BASE_: ./oneformer_convnext_large_bs16_160k.yaml SOLVER: IMS_PER_BATCH: 16 BASE_LR: 0.000025 # 8 gpu version used 0.0001 INPUT: MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 512) for x in range(5, 21)]"] MIN_SIZE_TRAIN_SAMPLING: "choice" MIN_SIZE_TEST: 512 MAX_SIZE_TRAIN: 2048 MAX_SIZE_TEST: 2048 CROP: ENABLED: True TYPE: "absolute" SIZE: (512, 512) SINGLE_CATEGORY_MAX_AREA: 1.0 COLOR_AUG_SSD: True SIZE_DIVISIBILITY: 512 # used in dataset mapper FORMAT: "RGB" TEST: DETECTIONS_PER_IMAGE: 250 EVAL_PERIOD: 5000 AUG: ENABLED: False MIN_SIZES: [128, 256, 512, 1024, 2048, 4096] MAX_SIZE: 4096 FLIP: True