Add support for forward mode automatic differentiation

🚀 Feature

Motivation

From the Gradients without Backpropagation (code is coming) and Forward AD in PyTorch FuncTorch, it seems significant optimization could be made without actually performing the backward pass.

From the paper:

We implement a forward-mode AD system in Python and
base this on PyTorch tensors in order to enable a fair comparison with a typical backpropagation pipeline in PyTorch,
which is widely used by the ML community.9 We release
our implementation publicly.10
Our forward-mode AD engine is implemented from scratch
using operator overloading and non-differentiable PyTorch
tensors (requires grad=False) as a building block.
This means that our forward AD implementation does not
use PyTorch’s reverse-mode implementation (called “autograd”) and computation graph. We produce the backpropagation results in experiments using PyTorch’s existing reverse-mode code (requires grad=True and
.backward()) as usual.

Pitch

Alternatives

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @borda @rohitgr7 @akihironitta

Lightning-AI / pytorch-lightning

Add support for forward mode automatic differentiation #12422