-
@wenhuach21 GPTQModel has merged `dynamic` per layer/module control of quantization but I don't think auto-round currently supports such per layer/module control during quantization. I know this is s…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
None
### Reproduction
None
### Expected behavior
None
### Others
Hi, Thank you for the fantastic wo…
-
model.cpp: loading model from runtime_outs/ne_qwen2_q_autoround.bin
The number of ne_parameters is wrong.
init: n_vocab = 151936
init: n_embd = 1536
init: n_mult = 8960
init: n_head …
-
this is using the example code only
```
from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "Intel/neur…
-
Cuda kernel only supports FP16, while the max value of some layers of Qwen is very large
-
... did not work for me right now, whereas previously it did. Cannot check this further at the moment, but maybe you might want to. Environment was kaggle and colab, occured after `!pip install auto-r…
-
Hi AutoRound Team,
Firstly, thank you for your fantastic work on AutoRound. It has been incredibly useful for model quantization.
I am reaching out to inquire about the possibility of adding sup…
-
When I serialize the model, I would like to serialize with all the formats available, e.g., GPTQ, AWQ, and AutoRound. However, it doesn't seem possible. If I first save with GPTQ format and then try A…
-
commit 5d92b25ccc0c937199d04101e1c4e7531c09e92f
- Reproduce the error
```bash
# auto-round/examples/language-modeling
python main.py
```
- Log
```bash
facebook/opt-125m
2024-09-08 21:50:1…
-
GPTQModel has merged AutorRound integration via PR https://github.com/ModelCloud/GPTQModel/pull/166 but we find the the CI tests is importing auto-gptq (via auto-round depends) as dependency which is…