Closed wxsms closed 2 months ago
@wxsms thanks for the feedback. w4a8_awq
with BF16 data type is not supported yet, we will add it in the following updates.
Hi @wxsms could we close this ticket now?
Hi @wxsms could we close this ticket now?
It's okay. we can also close this issue while this feature is fully supported. You may close it on your demand. Thanks
Thanks @wxsms . Please feel free to reopen it if neede.
System Info
ubuntu, with Ada GPUs. tllm version: 0.11.0.dev2024061800
Who can help?
@Tracin
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
use
example/quantization/quantize.py
to quant a model like this (I am using Llama):Expected behavior
the quantization should work
actual behavior
not working with error: FP8 is unsupported on with BF16 scales and zero-points!
additional notes
I notice that in
tensorrt_llm/cpp/tensorrt_llm/plugins/weightOnlyGroupwiseQuantMatmulPlugin/weightOnlyGroupwiseQuantMatmulPlugin.cpp
there is a snip of code like this:I not very sure but is this a mistake? though the error message is mentioning zero-points, but it throws without zero condition check (which in in the next block I think?).