-
### Describe the bug
I installed text generation webui and downloaded the model(TheBloke_Yarn-Mistral-7B-128k-AWQ) and I can't run it. I chose Transofmer as Model loader. I tried installing autoawq b…
-
### How would you like to use vllm
I want to run inference of a [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ). I don't know how to use it with vllm.
I try t…
-
Hi,
Great job on the project!
Could you let me know where I can find the BMP code?
Thanks!
-
I have fine tuned "meta-llama-3.1-8b-bnb-4bit" model using unsloth. I have downloaded the lora weights and able to do inferencing using those on Colab GPU.
But i want use this fine tuned model for …
-
Quantization of images that contains >= 50139473 pixels doesn't work in Chrome.
Tested in Chrome x64 on Windows.
Not reproducible in Firefox.
This means e.g. an image with a resolution of 708…
-
## My device information
```
NVIDIA Jetson AGX Orin Developer Kit(base) 64G
Package: nvidia-jetpack
Version: 6.1+b123
Priority: standard
Section: metapackages
Source: nvidia-jetpack (6.1)
Ma…
-
### Describe the issue
When trying to quantize a Yolov8 model (exported with `yolo export model=yolov8x.pt format=onnx`) with `onnxruntime`, I get the following error:
```
$ python quantize.py yo…
Jamil updated
3 months ago
-
**Describe the bug**
Got not useful error when trying to run phi-3.5-mini-instruct-onnx model locally on Windows.
```bash
RuntimeError: Error opening \cuda\cuda-int4-awq-block-128\phi-3.5-mini-instruc…
-
Scaling of quantization tables assumes that smaller divisors are always better, but that's not always true due to rounding errors.
An especially obvious case is when DC is quantized in a way that pr…
-
Hi:
I am trying to quantize a torchvision model (it is a slightly modified version of torchvision RetinaNet model), but when I apply direct quantization I see a large accuracy loss between the origin…