-
**Describe the bug**
This is a minor issue, but I think the quantization configuration in the file `[examples/quantization_24_sparse_w4a16/2:4_w4a16_group-128_recipe.yaml]`(https://github.com/vllm-pr…
-
I pull the [yolov5 repo](https://github.com/neuralmagic/yolov5) and download the coco dataset to ran the following command
The recipe download from [here](https://github.com/neuralmagic/sparseml/tre…
-
**What is the URL, file, or UI containing proposed doc change**
I recognize that this section is requesting some set of changes I've attempted to enact for your reference, but I don't actually have a…
-
**Describe the bug**
Hi, I used the llm-compressor quantization script in a Colab notebook, but it gives me an error concerning the torch dtype.
**Expected behavior**
The quantized model should b…
-
### Summary
Sparsity, like quantization, offers increased model performance at the expense of some model quality. However, it is not as widely used / researched as a technique, despite offering sim…
jcaip updated
5 months ago
-
**Describe the bug**
Trying out the FP8 quantization example script in the vllm doc. But it failed.
**Environment**
Include all relevant environment information:
Ubuntu Python 3.10
LLM Compr…
-
**Describe the bug**
When exporting the YOLOv8s (pruned50-quant, model.pt from sparsezoo) model via the ONNX exporter (sparseml.ultralytics.export_onnx), its performance noticeably decreases compar…
-
**Describe the bug**
Performing https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_24_sparse_w4a16
```python
import os
import torch
from llmcompressor.transforme…
-
**Is your feature request related to a problem? Please describe.**
Need to reduce model size of YOLOv10 while maintaining performance.
**Describe the solution you'd like**
Sparse and Quantizatio…
-
**Describe the bug**
RUN llm-compressor/examples/quantization_w8a8_fp8$ python llama3_example.py
save safetensors :KeyError: torch.float8_e4m3fn
**Expected behavior**
A clear and concise descri…