Runing quantized models with MLC-LLM error

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

MIT License

663 stars 50 forks source link

Runing quantized models with MLC-LLM error #18

Closed silvacarl2 closed 6 months ago

silvacarl2 commented 11 months ago

this line:

cm = ChatModule(model="dist/Llama-2-7b-chat-omniquant-w3a16g128asym/params", lib_path="dist/Llama-2-7b-chat-omniquant-w3a16g128asym/Llama-2-7b-chat-omniquant-w3a16g128asym-cuda.so")

produces this error:

JSONDecodeError: Expecting property name enclosed in double quotes: line 18 column 1 (char 497)

ChenMnZ commented 11 months ago

Thanks for your feedback. I find that this bug is caused by mlc-llm. So runing_quantized_models_with_mlc_llm.ipynb can not be use until official mlc-llm fixing their bug. I will fix this after mlc-llm fix the bug of there repo.

silvacarl2 commented 11 months ago

ok, cool will check back on this.

siddu9501 commented 11 months ago

There are rogue characters on this line. https://huggingface.co/ChenMnZ/Llama-2-7b-chat-omniquant-w3a16g128asym/blob/main/params/mlc-chat-config.json#L18 delete the blank spaces preceding the line and redo the spaces in any text editor.