[CUDA OOM] reproduce ViT-B-16-quickgelu on V100 32G

facebookresearch / MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Other

1.17k stars 49 forks source link

[CUDA OOM] reproduce ViT-B-16-quickgelu on V100 32G #47

Closed hbchen121 closed 5 months ago

hbchen121 commented 5 months ago

I tried to reproduce ViT-B-16-quickgelu on V100 32G with the same configuration, why am I OOM when batch_size=512? At this point, memory usage is 26/32GB when batch_size=256.

Do you know why that is?

hbchen121 commented 5 months ago

I fix this error by using "grad_checkpointing"

howardhsu commented 5 months ago

thx, yes, we use gradient checkpointing to train in on 64 V100 GPUs, or better/more GPUs if you want to turn off gradient checkpointing to speed up. https://github.com/facebookresearch/MetaCLIP/blob/ea88021fff8881523fd24183ac1a42ae0ccc40a6/run_configs_fullcc.py#L39