ModelTC / llmc

This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
https://arxiv.org/abs/2405.06001
Apache License 2.0
220 stars 26 forks source link

LLama3-8B-Instruct fail for TensorRT-LLM #21

Open gloritygithub11 opened 1 month ago

gloritygithub11 commented 1 month ago

Hello,

I'm tring to build with tensorrt, following is the config file:

base:
    seed: &seed 42
model:
    type: Llama
    path: /models/Meta-Llama-3-8B-Instruct
    torch_dtype: auto
calib:
    name: pileval
    download: False
    path: /app/llmc/tools/data/calib/wikitext2
    n_samples: 128
    bs: -1
    seq_len: 512
    preproc: pileval_awq
    seed: *seed
eval:
    eval_pos: []
    name: wikitext2
    download: False
    path: /app/llmc/tools/data/eval/wikitext2
    bs: 1
    seq_len: 2048
quant:
    method: Awq
    weight:
        bit: 4
        symmetric: True
        granularity: per_group
        group_size: 128
save:
    save_trans: False
    save_trtllm: True
    trtllm_cfg:
        tp_size: 1
        pp_size: 1
    save_path: ./save

I got following error:

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024052800
2024-08-05 08:09:33.435 | INFO     | llmc.utils.export_trtllm:cvt_trtllm_engine:93 - Start to export trtllm engine...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.44it/s]
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/app/llmc/llmc/__main__.py", line 147, in <module>
    main(config)
  File "/app/llmc/llmc/__main__.py", line 88, in main
    cvt_trtllm_engine(
  File "/app/llmc/llmc/utils/export_trtllm.py", line 95, in cvt_trtllm_engine
    convert_and_save_hf(hf_model, output_dir, cfg)
  File "/app/llmc/llmc/utils/export_trtllm.py", line 88, in convert_and_save_hf
    convert_and_save_rank(cfg, rank=0)
  File "/app/llmc/llmc/utils/export_trtllm.py", line 75, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 280, in from_hugging_face
    llama = convert.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1333, in from_hugging_face
    llama.load(weights)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 422, in load
    raise RuntimeError(
RuntimeError: Required but not provided tensors:{'transformer.vocab_embedding.per_token_scale'}
Weights loaded. Total time: 00:01:54
Harahan commented 4 weeks ago

We will fix this later.

gushiqiao commented 3 weeks ago

@helloyongyang