Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.11k stars 3.36k forks source link

Support for AdaptDL #15119

Open pietrolesci opened 1 year ago

pietrolesci commented 1 year ago

Reporting from the idea-pool channel on slack, as discussed with @carmocca.


Hi there,

On the way to solve a OOM problem with dynamic batch sizes based on sequence length, I have just discovered AdaptDL. Might be an interesting library to support.

Some core features offered by AdaptDL are:

carmocca commented 1 year ago

Hi! Thanks for the request.

Note however that we would only be interested in integrating the training tricks they provide such as batch size and learning rate scaling features.

We already support easy and elastic scaling of cloud jobs via our Lightning.ai platform. You can find ready-to-use apps such as the PL app to launch your PL scripts, or the demo app to get started.