NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.81k stars 300 forks source link

AssertionError: Device compute capability 8.9 or higher required for FP8 execution. #1159

Open kamrul-NSL opened 1 week ago

kamrul-NSL commented 1 week ago
import torch
import transformer_engine.pytorch as te
from transformer_engine.common import recipe

# Set dimensions.
in_features = 768
out_features = 3072
hidden_size = 2048
print("Initialization Complete!")
# Initialize model and inputs.
model = te.Linear(in_features, out_features, bias=True)
inp = torch.randn(hidden_size, in_features, device="cuda")

# Create an FP8 recipe. Note: All input args are optional.
fp8_recipe = recipe.DelayedScaling(margin=0, fp8_format=recipe.Format.E4M3)

# Enable autocasting for the forward pass
with te.fp8_autocast(enabled=True, fp8_recipe=fp8_recipe):
    out = model(inp)

loss = out.sum()
loss.backward()

print("Done!!")

I am trying to use fp8 for an experiment purpose . I installed all the necessary packages according to Nvidia's instruction. But got this issues.

 assert fp8_available, reason_for_no_fp8
AssertionError: Device compute capability 8.9 or higher required for FP8 execution.

In my machine I am using NVIDIA GeForce RTX 3090. And it contains nvidia-smi --query-gpu=compute_cap --format=csv compute_cap 8.6

Is it possible to use fp8 on 3090 GPU?

ptrendx commented 1 week ago

RTX 3090 is using Ampere architecture, which does not have support for FP8 execution.