Closed activezhao closed 4 days ago
Yes, we only enable FP8 FMHA for Hopper(SM90) at this moment. cc @PerkzZheng for vis
Yes, we only enable FP8 FMHA for Hopper(SM90) at this moment. cc @PerkzZheng for vis
@nv-guomingz OK, Got it.
Is there any plan to support Ada Arch?
We really want to use KV-Cache-Reuse feature.
Thanks so much!
@activezhao yes, this is on our roadmap, but there is no concret date. Will update here if we have any progress. Note that there are potential accuracy concerns with FP8 FMHA, I would suggest that you can try that on hopper first.
thank @PerkzZheng @activezhao could we close this ticket now?
@activezhao yes, this is on our roadmap, but there is no concret date. Will update here if we have any progress. Note that there are potential accuracy concerns with FP8 FMHA, I would suggest that you can try that on hopper first.
@PerkzZheng OK, thanks.
thank @PerkzZheng @activezhao could we close this ticket now?
@nv-guomingz Of course, please close it.
Thanks.
System Info
CPU x86_64
GPU NVIDIA L40
TensorRT branch: v0.10.0
CUDA: NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2
Who can help?
@Tracin
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I want to use KV-Cache-Reuse and Chunked Context, I use the following commands:
Expected behavior
The commands can work.
actual behavior
I got the following errors:
I use L40, so FP8 FMHA cannot be enabled on Ada?
additional notes
Hope there is a way to solve it.
Thanks.