ModelTC / llmc

This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
https://arxiv.org/abs/2405.06001
Apache License 2.0
220 stars 26 forks source link

Mixtral 8x7b failed on compile with tensorrt-llm #22

Open gloritygithub11 opened 1 month ago

gloritygithub11 commented 1 month ago

config file:

base:
    seed: &seed 42
model:
    type: Mixtral
    path: /models/Mixtral-8x7B-Instruct-v0.1
    torch_dtype: auto
calib:
    name: pileval
    download: False
    path: /app/llmc/tools/data/calib/wikitext2
    n_samples: 128
    bs: -1
    seq_len: 512
    preproc: pileval_awq
    seed: *seed
eval:
    eval_pos: []
    name: wikitext2
    download: False
    path: /app/llmc/tools/data/eval/wikitext2
    bs: 1
    seq_len: 2048
quant:
    method: Awq
    weight:
        bit: 4
        symmetric: True
        granularity: per_group
        group_size: 128
save:
    save_trans: False
    save_trtllm: True
    trtllm_cfg:
        tp_size: 1
        pp_size: 1
    save_path: ./save

get error:

2024-08-05 09:36:22.985 | INFO     | llmc.utils.export_trtllm:cvt_trtllm_engine:93 - Start to export trtllm engine...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 19/19 [00:09<00:00,  2.08it/s]
[08/05/2024-09:36:34] Some parameters are on the meta device device because they were offloaded to the cpu.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/app/llmc/llmc/__main__.py", line 147, in <module>
    main(config)
  File "/app/llmc/llmc/__main__.py", line 88, in main
    cvt_trtllm_engine(
  File "/app/llmc/llmc/utils/export_trtllm.py", line 95, in cvt_trtllm_engine
    convert_and_save_hf(hf_model, output_dir, cfg)
  File "/app/llmc/llmc/utils/export_trtllm.py", line 88, in convert_and_save_hf
    convert_and_save_rank(cfg, rank=0)
  File "/app/llmc/llmc/utils/export_trtllm.py", line 75, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 280, in from_hugging_face
    llama = convert.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1325, in from_hugging_face
    weights = load_weights_from_hf(config=config,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1434, in load_weights_from_hf
    weights = convert_hf_llama(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1087, in convert_hf_llama
    convert_layer(l)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 740, in convert_layer
    get_tllm_linear_weight(split_v, tllm_prex + 'attention.qkv.',
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 457, in get_tllm_linear_weight
    v.cpu(), plugin_weight_only_quant_type)
NotImplementedError: Cannot copy out of meta tensor; no data!
Harahan commented 4 weeks ago

We will fix this later.

gushiqiao commented 3 weeks ago

@helloyongyang