-
I tryed to modify your example code to run this model on lowvram card by BNB 4bit or 8bit quantization config.
While use bnb 4bit config like below:
```python
qnt_config = BitsAndBytesConfig(load…
-
### 🐛 Describe the bug
- I'm reporting this issue due to errors related to capture_pre_autograd_graph and torch.compile in QAT.
- Note: Apologies if there are any misunderstandings.
- Based on th…
-
I found a [similar closed issue](https://github.com/microsoft/VPTQ/issues/56) related to this topic. Following your reply in that issue, I successfully configured the `vptq-algo` environment based on …
-
Hi,
I’m using YOLOv9 for segmentation tasks and noticed that quantization is currently supported for object detection models. Since the backbone is the same across all YOLOv9 variants, I wanted to …
-
hej,
First of all, congratulations on the nice job you've done with this package😺
Secondly, I was wondering if you would be willing to accept an extension of linear quantization to support sign…
-
i have completed stable diffusion quantization in txt2img as demo shows.
the result is very good.
when i want to transfer sd quantization in inpainting task, i meet the problem that the quantization r…
-
In the case of very small numbers input numbers around the subnormal range of `torch.float` or `torch.bfloat16`, the scale exponent will take its smallest unbiased value: `-127`. However, you only all…
-
Hello! First of all, great job with this inference engine! Thanks a lot for your work!
Here's my issue: I have run vllm with both a mistral instruct model and it's AWQ quantized version. I've quant…
-
### System Info
Ubuntu 20.04
NVIDIA A100
nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 and 24.07
TensorRT-LLM v0.14.0 and v0.11.0
### Who can help?
@Tracin
### Information
- [x] The offici…
-
Hi, i got an anomaly while inference mistral with AWQ, below is the GPU usage on 3090 consume 20GB GPU. even if we inference the base model only consume 19GB GPU
here is the command: python -m vl…