Open vinod-sarvam opened 8 months ago
Hi @vinod-sarvam It is not decided yet. We will support W8A8 (in FP8 not INT8) recently and discuss about that later.
Thanks @Tracin. Is FP8 already supported for Mixtral-type MoE models? When is that expected?
hi @vinod-sarvam please try our latest code base and moe had been supported fp8 yet.
And do u still have further issue or question now? If not, we'll close it soon.
Hi,
When can we expect TRT-LLM to support Smooth Quant (W8A8) quantisations for MoE models like Mixtral. Is it being planned in your roadmap? Clarity on this would be highly beneficial.