Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.35k stars 3.38k forks source link

Dynamic/variable batch size support #16914

Open HsunGong opened 1 year ago

HsunGong commented 1 year ago

Description & Motivation

Support like:

GPU1: batch-size=24 GPU2: batch-size=12 GPU3: batch-size=16 GPU4: batch-size=24


As different batch has different length

See

https://github.com/microsoft/DeepSpeed/issues/1051

https://github.com/facebookresearch/fairseq/blob/b5a039c292facba9c73f59ff34621ec131d82341/fairseq/data/data_utils.py#L282

Pitch

No response

Alternatives

No response

Additional context

No response

cc @borda

HsunGong commented 1 year ago

Hello

FarzanT commented 1 year ago

Related to #15573

Borda commented 1 year ago

@HsunGong could you pls share your use case when these various batch sizes could be used? probably if you have a heterogenous set of GPU cards?

HsunGong commented 1 year ago

For sure, we have 2080ti, 3090, a10, a40 gpu cards, and we want to put dynamic batch size according to the gpu. We've made it using pytorch-ddp by customizing samplers for each gpu.

bm-synth commented 8 months ago

if not too late: I started developing dynamic batch size and the corresponding LR scaling on deepspeed in PR 5237. Hopefully it will be enabled with a simple config file change and work out of the box in lightning. Stay tuned, should be done this week.