Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.11k stars 67 forks source link

Support RN50 BatchNorm fusions with cudnn #487

Open vedaanta opened 2 months ago

vedaanta commented 2 months ago

🚀 Feature

CuDNN can support RN50 batchnorm fusions (and their symmetric backward counterparts) like:

Above fusions also support fp8 quantization around them.

Motivation

RN50 benchmark that is being added in #443, can benefit from cudnn fusions at:

# BN-relu fusion
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)

# BN-Add-relu fusion
out = self.bn3(out)
...
out = out + identity
out = self.relu(out)

Pitch

Add support for BN fusions via cudnn executor.

CC @IvanYashchuk @kshitij12345 @Anerudhan

t-vi commented 1 month ago

We now have resnet50 from TorchVision natively, and #451 will add a benchmark.