-
Even with my model that is less than 250KB in size, I get the onnx_data file after quantization.
https://github.com/onnx/neural-compressor/blob/aabbf967cf7ea91c078c28c7b4dab043add5257b/onnx_neural…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
### Description
In light of optimizing disk usage for KNN vector searches in Lucene, I propose considering a new KnnVectorsFormat class in Lucene that handles only quantized vectors, eliminating the …
-
### Model Series
Qwen2.5
### What are the models used?
Qwen/Qwen2.5-1.5B-Instruct
### What is the scenario where the problem happened?
[inference with] with [vllm]
### Is this a known issue?
- …
-
### This is my env version:
```
torch:2.2.1
transformers: 4.39.0.dev0
vllm: custom compile at master@24aecf421a4ad5989697010963074904fead9a1b
```
### I use SqueezeLLM quantization my llama-7B tr…
-
Similar to affine quantization, we can implement codebook or look up table based quantization, which is another popular type of quantization, especially for lower bits like 4 bits or below (used in ht…
-
**Describe the bug**
When I use llm-compressor to quantize llava model, but at the begining, it failed. (Unrecognized configuration class: 'transformers.models.llava.configuration_llava.LlavaConfig'…
-
# Quantization Impact on Model Accuracy | Slightwind
Mistral-7B’s performance on 5-shot MMLU 如果对测试细节不感兴趣,只需要看下面给出的汇总表格即可。
Overview 量化/非量化版本的 Mistral-7B-v0.1 模型在 5-shot MMLU 上的表现:
Quant Type Compute D…
-
### Search before asking
- [X] I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and found no similar bug report.
### YOLOv5 Component
Export
### Bug
Hello
When …
-
### 🚀 The feature, motivation and pitch
I am working on the quantization scheme of the large model BitAndBytes, the quantization is very smooth when using transformers, but the inference speed is sti…