Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.53k stars 3.39k forks source link

Fully Sharded Training clip_grad_norm_ #13339

Closed wangleiofficial closed 1 year ago

wangleiofficial commented 2 years ago

🚀 Feature

FSDP does not supprot the gradient_clip_val setting in Trainer

Motivation

Pitch

Alternatives

Additional context


If you enjoy Lightning, check out our other projects! âš¡

cc @SeanNaren @awaelchli @rohitgr7 @akihironitta

rohitgr7 commented 2 years ago

hey!

are you getting any errors when you are specifying it?

wangleiofficial commented 2 years ago

when specifying it, cause the error: pytorch_lightning.utilities.exceptions.MisconfigurationException: gradient_clip_algorithm='norm' is currently not supported for FullyShardedNativeMixedPrecisionPlugin

SeanNaren commented 2 years ago

A fix around this is to pass this in your LightningModule:

...
    def configure_gradient_clipping(
            self,
            optimizer,
            optimizer_idx: int,
            gradient_clip_val: Optional[Union[int, float]] = None,
            gradient_clip_algorithm: Optional[str] = None,
    ):
        assert gradient_clip_algorithm in ('norm', None), gradient_clip_algorithm
        self.model.clip_grad_norm_(gradient_clip_val)

But we'll need more context about how your code is structured.

Fully Sharded training requires that you wrap your model (it doesn't wrap it for you). As a result, only you know the reference to call the clip_grad_norm_ function for your model. You can see an example of this here: https://github.com/SeanNaren/SmallScience/blob/fsdp/train.py#L235-L243

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!