-
can we infer the Pytorch int8 model? what is the benchmark report pytorch int8 vs trt int8?
-
I'm confused with the equation (12), what means the outer product of sw and sx? The activation is per-token quantization?
-
Firstly, thanks to all of you for the bravo project!
Currently, the model seems like does not support int8 quantization. Any plan on it?
-
Hi, thank you for this work. How to quantize it to use int8? Any comments are appreciated.
-
Hey, Can you please elaborate on Quantisation method you used here for SD-1.4. I am trying to implement similar project bit stuck with Quantisation process. I presume you user INT-8 quantisation for d…
-
Could be either a dedicated flag (...or a filter)
-
Hi,
Thanks for the great work!
Have your team tried QAT/PTQ int8 quantization on star operations? After all, the networks are usually quantized before deploying in real production.
Thanks for…
-
Hi all. I'm currently working to the implementation of a quantized version of Mixtral 8x22b. I'm using the weights from the following repo: [MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-GGUF](https://hug…
-
Hey, thanks for your work. I saw https://huggingface.co/ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16/discussions/2 about how 8-bit KV cache quantization can be enabled on vLLM. I am not too su…
-
Hi
is there a way to run EETQ without accelerator ?
at least for the quantization process
thanks