NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.68k stars 2.12k forks source link

[Myelin] Myelin fused Attn but not run at MHA Kernel #3620

Closed DefTruth closed 8 months ago

DefTruth commented 9 months ago

Description

I want to figure out if the Attention fused by Myelin is run on MHA kernel, but the nsys results shows that only xmma_gemm kernel apply, so, how can i use MHA/FMHA Kernel in TensorRT manually, any docs can help, many thanks ~

nsys profile results:

ONNX vs Layers after Myelin optimization

Environment

TensorRT Version: 9.2

NVIDIA GPU: A30 / 3080

NVIDIA Driver Version: 525

CUDA Version: 12.2

CUDNN Version: 8.9

Operating System: Linux

Python Version (if applicable): 3.10

Tensorflow Version (if applicable): none

PyTorch Version (if applicable): 2.1.2

Baremetal or Container (if so, version): none

Relevant Files

Model link: none

Steps To Reproduce

to reproduce, please check the blog:

DefTruth commented 8 months ago

solved. I have rewrite the attn to match the pattern below and use TRT 9.2, then, mha_v2 kernel has been used.

[B, S, H] -MatMul-> [B, S, H] -Reshape-> [B, S, N, h] -Transpose-> [B, N, S, h] -> MatMul -> [B, N, S, S] -> MatMul -> [B, N, S, h] -Transpose-> [B, S, N, h] -Reshape-> [B, S, H] -LayerNorm->...
[B, S, H] -MatMul-> [B, S, H] -Reshape-> [B, S, N, h] -Transpose-> [B, N, h, S] ---^                           ^
[B, S, H] -MatMul-> [B, S, H] -Reshape-> [B, S, N, h] -Transpose-> [B, N, S, h] --------------------------------

specific, for Q, K, V is:

Q: [B, S, H] -MatMul-> [B, S, H] -Reshape-> [B, S, N, h] -Transpose-> [B, N, S, h] -> MatMul -> [B, N, S, S] -> MatMul -> [B, N, S, h] -Transpose-> [B, S, N, h] -Reshape-> [B, S, H] -LayerNorm->...
k: [B, S, H] -MatMul-> [B, S, H] -Reshape-> [B, S, N, h] -Transpose-> [B, N, h, S] ---^                           ^
V: [B, S, H] -MatMul-> [B, S, H] -Reshape-> [B, S, N, h] -Transpose-> [B, N, S, h] --------------------------------