-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar feature requests.
### Description
1.58 bit quantization i…
-
### 🚀 The feature, motivation and pitch
# Summary
We would like to support the 4-bit KV cache for the decoding phase. The purpose of this feature is to reduce the GPU memory usage of the KV cache wh…
-
### Motivation
In the code-llama's deploy tutorial, quantization chapter remains to be done, when will this feature finished?
### Related resources
_No response_
### Additional context
_No respon…
-
please suggest some sources. i have tried several sources but nothing works for me.
-
Hi,
I understand that currently you are quantizing the model weights in a **per-row** fashion. Can you extend QuIP# to **per-group** granularity? Can you elaborate on why and why not?
Thanks
-
tldr: can we get a way to bypass calibration/measurement and save a 'calibration.json'? Not to produce better models so much but to patch/hack them.
Does this belong in *issues*? I think at least a…
-
# Feature Description
Please provide a detailed written description of what you were trying to do, and what you expected `llama.cpp` to do as an enhancement.
# Motivation
It sounds like it's …
-
Hi, I get an error when I run `vectree.py`.
> ================== Print Info ==================
Input_feats_shape: torch.Size([1554770, 62])
VQ_feats_shape: torch.Size([1554770, 27])
SH_degree: …
-
Is there any method to convert Griffin models to gguf?
I want to quantize this model to q4_K type
Any kind of help is appreciated
Thanks
-
Percentile Optimizer is a commonly used calibration method aimed at activation calibration.
We need to make QModuleMixin support the optimizer for activation first