-
-
Fp8 or AWQ quant
-
I'd like to raise a concern about how quantization is currently handled in SpeechBrain. While training my own k-means quantizer on the last layer of an ASR model, I noticed that the interface was not …
-
Provide an approach allowing to finetune LLM models using lora more efficiently.
-
Could you please provide the code for training quantization aware accuracy predictor or creating dataset for quantization aware accuracy predictor?
-
Do you think it's feasible to add [Additive Power OF Two Quantization](https://arxiv.org/abs/1909.13144) to Brevitas?
Even if it is known as non-uniform quantization technique, it is so HW friendly…
-
I have migrated the method to the qwenvl model and evaluated it using VLMEvalKit for certain visual tasks under the int2 condition. The specific link is as follows:[quip-sharp-qwenvl](https://github.c…
-
Implement this paper: https://arxiv.org/abs/2405.12497 as a new quantization type
-
https://developer.nvidia.com/zh-cn/blog/nvidia-tensorrt-llm-revs-up-inference-for-google-gemma/
This post says gemma supports quantization, so does recurrentgemma support quantization?
-
### 🚀 The feature, motivation and pitch
With a single command, quantize the same model across every available quant scheme and configuration and output a table that compares the results. This will …
byjlw updated
3 weeks ago