Fully Sharded Training clip_grad_norm_

wangleiofficial commented 2 years ago

🚀 Feature

FSDP does not supprot the gradient_clip_val setting in Trainer

Motivation

Pitch

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @SeanNaren @awaelchli @rohitgr7 @akihironitta

rohitgr7 commented 2 years ago

hey!

are you getting any errors when you are specifying it?

wangleiofficial commented 2 years ago

when specifying it, cause the error: pytorch_lightning.utilities.exceptions.MisconfigurationException: gradient_clip_algorithm='norm' is currently not supported for FullyShardedNativeMixedPrecisionPlugin

SeanNaren commented 2 years ago

A fix around this is to pass this in your LightningModule:

...
    def configure_gradient_clipping(
            self,
            optimizer,
            optimizer_idx: int,
            gradient_clip_val: Optional[Union[int, float]] = None,
            gradient_clip_algorithm: Optional[str] = None,
    ):
        assert gradient_clip_algorithm in ('norm', None), gradient_clip_algorithm
        self.model.clip_grad_norm_(gradient_clip_val)

But we'll need more context about how your code is structured.

Fully Sharded training requires that you wrap your model (it doesn't wrap it for you). As a result, only you know the reference to call the clip_grad_norm_ function for your model. You can see an example of this here: https://github.com/SeanNaren/SmallScience/blob/fsdp/train.py#L235-L243

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

Lightning-AI / pytorch-lightning