Gradient accumulation implemention

czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

https://arxiv.org/abs/2205.08534

Apache License 2.0

1.27k stars 140 forks source link

Gradient accumulation implemention #165

Open King4819 opened 8 months ago

King4819 commented 8 months ago

I want to ask that how to implement gradient accumulation on your work. Since my computing resource is single RTX4090 (24GB), so I'm not able to set batch size to 16, thanks !!!

duanduanduanyuchen commented 8 months ago

Hi, you can use GradientCumulativeOptimizerHook. Just set the dict in the config file like this:

data = dict(samples_per_gpu=1)
optimizer_config = dict(type='GradientCumulativeOptimizerHook', cumulative_iters=2)

The total batch_size will be samples_per_gpu*cumulative_iters*num_gpus.

King4819 commented 8 months ago

@duanduanduanyuchen Thanks for your reply !