OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
663 stars 50 forks source link

Cannot compile with mlc-llm #21

Open 0x1997 opened 11 months ago

0x1997 commented 11 months ago

I quantized a custom fine-tuned llama2 70b model like this.

$ python main.py \
  --model /data/finetuned_llama2_70b  \
  --epochs 20 \
  --output_dir /data/finetuned_llama2_70b_output \
  --wbits 4 \
  --abits 16 \
  --group_size 128 \
  --lwc \
  --net Llama-2-70b

$ python main.py \
  --model /data/finetuned_llama2_70b \
  --epochs 0 \
  --output_dir /data/finetuned_llama2_70b_output2 \
  --save_dir /data/finetuned_llama2_70b_omniquant \
  --resume /data/finetuned_llama2_70b_output/omni_parameters.pth \
  --wbits 4 \
  --abits 16 \
  --group_size 128 \
  --lwc \
  --net Llama-2-70b

Then I updated mlc_llm/quantization/__init__.py like this

"w4a16g128asym": QuantizationScheme(
    name="w4a16g128asym",
    linear_weight=GroupQuantizationSpec(
        dtype="float16",
        mode="int4",
        sym=False,
        storage_nbit=16,
        group_size=128,
        transpose=False,
    ),
    embedding_table=None,
    final_fc_weight=None,
)

When I try to compile the model with mlc-llm,

$ python -m mlc_llm.build \
  --model /data/finetuned_llama2_70b_omniquant \
  --target cuda \
  --quantization w4a16g128asym \
  --artifact-path /data/finetuned_llama2_70b_omniquant_mlc \
  --use-cache 0

I got this error.

Start computing and quantizing weights... This may take a while.
Traceback (most recent call last):
  File "~/mlc-llm/mlc_llm/build.py", line 42, in main
    core.build_model_from_args(parsed_args)
  File "~/mlc-llm/mlc_llm/core.py", line 619, in build_model_from_args
    new_params = utils.convert_weights(param_manager, params, args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/mlc-llm/mlc_llm/utils.py", line 258, in convert_weights
    vm["transform_params"]()
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "~/mambaforge/envs/mlc/lib/python3.11/site-packages/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "~/mlc-llm/mlc_llm/relax_model/param_manager.py", line 558, in get_item
    for torch_binname in [
                         ^
  File "~/mlc-llm/mlc_llm/relax_model/param_manager.py", line 559, in <listcomp>
    self.torch_pname2binname[torch_pname] for torch_pname in torch_pnames
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'model.layers.0.self_attn.q_proj.weight'
shifeiwen commented 9 months ago

Same error. Is there any progress on this issue so far? @0x1997

shifeiwen commented 9 months ago

@ChenMnZ Do you have any progress or tips on this? It can be that I successfully loaded and the quant weight in mlc.