dengzheng-cloud commented 5 months ago

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

download gemma-2b-it from huggingface
use autoawq quantize script and set export_compatible to true, then get scaled gemma model
```
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
```

model_path = 'lmsys/vicuna-7b-v1.5' quant_path = 'vicuna-7b-v1.5-awq' quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

Load model

model = AutoAWQForCausalLM.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Quantize

model.quantize(tokenizer, quant_config=quant_config, export_compatible=True)

Save quantized model

model.save_quantized(quant_path) tokenizer.save_pretrained(quant_path)


4. python convert.py {awq_model} --outfile {awq_model}/model.gguf
5. ./main -m model.gguf -p "build website .... ... " -n 400 -e
after run this get output like "increa increa increa increa increa ....",  while get right output from vllm + awq model

anybody know how to address this question, please give a help, thx!

will this caused by general.architecture==llama?

If the bug concerns the server, please try to reproduce it first using the [server test scenario framework](https://github.com/ggerganov/llama.cpp/tree/master/examples/server/tests).

ggerganov commented 5 months ago

will this caused by general.architecture==llama?

If it reports llama instead of gemma then something went wrong somewhere

dengzheng-cloud commented 5 months ago

will this caused by general.architecture==llama?

If it reports llama instead of gemma then something went wrong somewhere

yes, i have checked the config.json, model type is alright, during convert it doesn't output the archi info, but when i quantize the model converted, it info like the screenshot, archi is llama, how could i set archi during convert?

dengzheng-cloud commented 5 months ago

already checked, no business with llama.cpp, close this issue. thx for @ggerganov 's help~ it's very kind of you.

ggerganov / llama.cpp

llama.cpp + autoawq + gemma model get wrong answer #6633

Load model

Quantize

Save quantized model