Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.87k stars 3.34k forks source link

Deepspeed activation Partitioning #18732

Open LogicBaron opened 11 months ago

LogicBaron commented 11 months ago

📚 Documentation

Hello,

partition_activations (bool) – Enables partition activation when used with ZeRO stage 3 and model parallelism. Still requires you to wrap your forward functions in deepspeed.checkpointing.checkpoint. See deepspeed tutorial.

Upon encountering issues with activation partitioning and after checking, I found that Deepspeed activation partitioning is not significantly related to the use of zero-3; rather, it appears that the setup of model parallelism and mpu object is crucial.

Also, it is explicitly stated that pipeline parallelism, a model parallelism method provided by Deepspeed, cannot be used in conjunction with zero-2 and zero-3 from the outset.

Additionally, in the GitHub issue referenced in the official documentation, zero-stage3 and activation partitioning are used together; however, this pairing holds no particular significance.

Therefore, it is thought that there should be clearer statements regarding the use conditions for activation partitioning, beyond simply specifying that it should be used with zero3 + mp.

cc @borda @awaelchli

CrypticRevenger commented 11 months ago

Please assign me, I want to do it.

LogicBaron commented 10 months ago

Thank you, is there anything I need to do?