-
In Glow, the quantization is done in two passes:
- profile generation
- quantization profile loading, and quantized model inference
Our understanding is that this mechanism works with the assumpt…
-
### Problem
Flux is a transformers based image model. It's rather large and fills a whole 24g card. People have made GGUF, bitsnbytes and NF4 loaders for comfyui which all use those LLM quantizations…
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N…
-
Could you put the actual text for command to run inference with Quantization?
I cannot see the image because I'm blind and uses screen reader.
Readme says "With quantization, you can run LLaMA with …
-
### Your current environment
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC ve…
-
Thank you for your work.
I am trying to quantize the MiDaS DPT_Large model into INT 8 quantization.
I have searched through github and googled, and asked bing if there is any one liner code to q…
-
I want to debug the quantization process, and I downloaded the official docker image. After entering, the debugging shows insufficient permissions. How can I solve it?
![image](https://github.com/Xil…
-
### 🐛 Describe the bug
I use pytorch-quantization to do QAT for a pointpillar model, it works fine during pytorch training, however, when I export the torch model to onnx, accuracy degrades badly. …
-
Hello @edgchen1 @wejoncy I tried to quantize the mars-model used in deepsort tracking. Using the example in `image_classification/cpu ` I am able to quantize my mars model. Size of the model has reduc…
-
#### Introduction
Vector databases have gained significant importance due to the rise of AI, machine learning, and deep learning applications. These databases store high-dimensional vectors repre…