-
### Describe the issue
The preprocess step for quantization does not work with the latest onnxruntime version:
```shell
python -m onnxruntime.quantization.preprocess --input image_resize.onnx --outp…
maaft updated
3 weeks ago
-
### Your current environment
vllm==0.6.3.post1
### Model Input Dumps
```bash
ValueError: Weight input_size_per_partition = 10944 is not divisible by min_thread_k = 128. Consider reducing tensor_pa…
-
**Describe the bug**
When I run the example from examples/python/awq-quantized-model.md, but switching out phi-3 for llama-3.2-3b, I get an error message stating that `AttributeError: 'NoneType' objec…
-
Even with my model that is less than 250KB in size, I get the onnx_data file after quantization.
https://github.com/onnx/neural-compressor/blob/aabbf967cf7ea91c078c28c7b4dab043add5257b/onnx_neural…
-
### Your current environment
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Debia…
-
### Model Series
Qwen2.5
### What are the models used?
Qwen/Qwen2.5-1.5B-Instruct
### What is the scenario where the problem happened?
[inference with] with [vllm]
### Is this a known issue?
- …
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
### Description
In light of optimizing disk usage for KNN vector searches in Lucene, I propose considering a new KnnVectorsFormat class in Lucene that handles only quantized vectors, eliminating the …
-
Similar to affine quantization, we can implement codebook or look up table based quantization, which is another popular type of quantization, especially for lower bits like 4 bits or below (used in ht…
-
what was the quantisation algorithm used in unsloth/Llama-3.2-1B-bnb-4bit model: https://huggingface.co/docs/transformers/main/en/quantization/overview. Is it int4_awq or int4_weightonly ?