-
Thank you for your efforts.
I'm curious to know if there are any codes or scripts for quantizing my own 2-bit stable diffusion models, rather than relying on the pre-existing model available on Goog…
-
Hi @AlexeyAB,
Is there are a way to quantize yolov4 weight file to FP16 or to INT8 without using tflite?
-
Can I load QLoRA fine-tuning weights into a Hugging Face model as shown below?
```python
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
quantization_config = BitsAndBytesConfig(
load_in_4bit=T…
-
### Describe the feature request
Support for quantizing and running quantized models in 4bit, 2bit and 1bit. Also saving and loading these models in onnx format for lower file sizes.
The GPU doesn…
-
Hello!
I am new to Intel Caffe!
As i read Intel document "LOWER NUMERICAL PRECISION DEEP LEARNING INFERENCE AND TRAINING". It said that "**quantizing the weights is done before inference starts. Qua…
-
### Feature request
Hi! I’ve been researching LLM quantization recently ([this paper](https://arxiv.org/abs/2405.14852)), and noticed a potentially improtant issue that arises when using LLMs with 1-…
-
Thank you very much for your work,
I refer to your code modification yolov5,When w4a8 quantizing There are nearly 3points of loss,Have you experimented yolov5
-
I used auto_gptq 0.7.1 and run this code:
python quant_with_alpaca.py --pretrained_model_dir Qwen1.5-14B-Chat --quantized_model_dir Qwen1.5-14B-Chat_4bit --use_triton --save_and_reload --trust_remote…
-
### Your current environment
(venv-vllm-54) (base) root@I1ba088648b009018e4:/hy-tmp# nvidia-smi
Tue Aug 6 10:29:16 2024
+--------------------------------------------------------------------…
-
I trained a qkeras model with kernel and bias quantizers for every QDense layer as `quantized_bits(8,0)`. After training, I print out the weights and biases of the QDense layers.
I expect them to h…
r1bhu updated
2 years ago