Open youki-sada opened 7 months ago
@nvpohanh ^ ^
Is there reason why your application could not run in FP16? We would like to understand if there are any ConvNet examples where BF16 must be fused to help us decide the priority of BF16 conv perf optimizations. Thanks!
We are working on efficient vision transformer models and those adopts convolutions for former layers and multi-head attention for latter ones. As for multi-head attention, we need to use BF16 or FP32 to maintain accuracy. Thus, in our case, it would be easiest way to quantize whole layers with BF16 including convolutional layers.
I see. so it is a networks with convs + transformers, right? I will bring this feedback internally for discussion.
For MHA (multi-head attention) + FP16, is it because it runs into overflow issue? If so, we can also try FP16 MHA with FP32 accumulation by:
Q -> Cast(toFP32) -> MatMul -> Cast(toFP16) -> Softmax -> Cast(toFP32) -> MatMul -> Cast(toFP16) -> ...
K -> Cast(toFP32) ----^
V -> Cast(toFP32) ----^
Yes, it is consist of both convs and transformers.
We found FP16+BF16 mixed precision solved accuracy degradation, but It will be very simple if we can use BF16 for all layers by --bf16
.
I will bring this feedback internally for discussion.
I appreciate it. For detail, the accuracy degradation is because of overflow and underflow issue. Some networks (e.g. EfficientViT) adopts linear attention that divides MHA output by its last channel. This division is usually fused into next pointwise layer in TRT and zero division occurs by FP16 underflow.
Try to use the latest version of trt.
Do you have any plan to fix
--bf16
option since it does not affect convolutional layers and those remain tf32? We succeeded bfloat16 quantization by settingprecisionConstraints
andlayerPrecisions
with wildcard. However, the performance is not the same as--fp16
.layer precisions from
trtexec --onnx=tmp.onnx --bf16
TRExRelated issue; #3583
Environment
TensorRT Version: TensorRT OSS v9.3.0 NVIDIA GPU: RTX4090 NVIDIA Driver Version: 535.154.05 CUDA Version: 12.2
Operating System: ubuntu22.04 docker
Relevant Files
tmp.zip