Closed tomhog closed 1 year ago
Hi, thanks for pointing out the plausibility of confusion. We will make it more clear in a future commit.
Additionally would it be possible to achieve good results with less, say 2xA6000 (96GB), with it just taking longer?
I think so if the memory is enough. You might need to tune some hyperparameters though as we provide values for training with 8 GPUs
Thanks for the response,
For anyone interested I've been able to get Convnext Large ADE20K to train on two A6000 gpus by lowering the crop size to 512x512 while still keeping the batch size at 16. I used the following config
_BASE_: ./oneformer_convnext_large_bs16_160k.yaml SOLVER: IMS_PER_BATCH: 16 BASE_LR: 0.000025 # 8 gpu version used 0.0001 INPUT: MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 512) for x in range(5, 21)]"] MIN_SIZE_TRAIN_SAMPLING: "choice" MIN_SIZE_TEST: 512 MAX_SIZE_TRAIN: 2048 MAX_SIZE_TEST: 2048 CROP: ENABLED: True TYPE: "absolute" SIZE: (512, 512) SINGLE_CATEGORY_MAX_AREA: 1.0 COLOR_AUG_SSD: True SIZE_DIVISIBILITY: 512 # used in dataset mapper FORMAT: "RGB" TEST: DETECTIONS_PER_IMAGE: 250 EVAL_PERIOD: 5000 AUG: ENABLED: False MIN_SIZES: [128, 256, 512, 1024, 2048, 4096] MAX_SIZE: 4096 FLIP: True
Hi
Firstly thank you for releasing this amazing work. Not only is the model amazing but the code quality is excellent. Very easy to follow.
I have a question regarding GPU memory requirements for training. In the readme there's a bit of conflicting information.
We train all our models using 8 A6000 (48 GB each) GPUs. We use 8 A100 (80 GB each) for training Swin-L† OneFormer and DiNAT-L† OneFormer on COCO and all
Is it 8xA6000 (384GB) or 8xA100 (640GB)? Additionally would it be possible to achieve good results with less, say 2xA6000 (96GB), with it just taking longer?
Many Thanks Tom