NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.55k stars 2.1k forks source link

how to support FlashAttention2? #4082

Open echosyy opened 1 month ago

echosyy commented 1 month ago

Hi, does the currently transferred trt engine support flash focus2 by default? If it is not supported by default, how should I use fa2 in the output engine? Thanks,

lix19937 commented 3 weeks ago

trt-llm should be support it.