InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.99k stars 363 forks source link

glm4-9b如何量化运行? #1976

Closed maxin9966 closed 1 month ago

maxin9966 commented 1 month ago

Motivation

lmdeploy lite auto_awq THUDM/glm-4-9b-chat --work-dir ~/work/models/glm-4-9b-chat-4bit

提示不支持,无法量化成awq格式,目前huggingface上也没有awq格式的glm4-9b

Related resources

No response

Additional context

No response

zhyncs commented 1 month ago
python3 -m lmdeploy lite auto_awq /workdir/glm-4-9b-chat
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/workdir/lmdeploy/lmdeploy/__main__.py", line 5, in <module>
    run()
  File "/workdir/lmdeploy/lmdeploy/cli/entrypoint.py", line 36, in run
    args.run(args)
  File "/workdir/lmdeploy/lmdeploy/cli/lite.py", line 139, in auto_awq
    auto_awq(**kwargs)
  File "/workdir/lmdeploy/lmdeploy/lite/apis/auto_awq.py", line 103, in auto_awq
    vl_model, model, tokenizer, work_dir = calibrate(model,
  File "/workdir/lmdeploy/lmdeploy/lite/apis/calibrate.py", line 188, in calibrate
    raise RuntimeError(
RuntimeError: Currently, quantification and calibration of ChatGLMForConditionalGeneration are not supported. The supported model types are InternLMForCausalLM, InternLM2ForCausalLM, QWenLMHeadModel, Qwen2ForCausalLM, BaiChuanForCausalLM, BaichuanForCausalLM, LlamaForCausalLM, LlavaLlamaForCausalLM, MGMLlamaForCausalLM, InternLMXComposer2ForCausalLM.

cc @AllentDan

zhyncs commented 1 month ago

GLM-4-9B-Chat performs quite well and it is worth considering to provide support for it. ref https://github.com/InternLM/lmdeploy/issues/1974#issue-2398575348

zhyncs commented 1 month ago

ref https://github.com/InternLM/lmdeploy/pull/1993 https://github.com/zhyncs/lmdeploy-build/releases/tag/6367976