lihongqiang commented 1 month ago

I got a error when I run the run_awq_llm.sh. I want to ask the question that the llmc can suport the llama3.1-70b? or my config is error. please help me to solve the problem.

my yml file:

base: seed: &seed 42 model: type: Llama path: /data/root/jupyter/modelscope/fintune/autodl-tmp/LLM-Research/Meta-Llama-3.1-8B-Instruct tokenizer_mode: slow torch_dtype: auto calib: name: pileval download: False path: /data/root/jupyter/modelscope/fintune/llmc/data/pileval n_samples: 128 bs: -1 seq_len: 512 preproc: general seed: *seed eval: eval_pos: [pretrain, transformed, fake_quant] name: wikitext2 download: False path: /data/root/jupyter/modelscope/fintune/llmc/data/wikitext2 bs: 20 inference_per_block: True

For 70B model eval, bs can be set to 20, and inference_per_block can be set to True.

# For 7B / 13B model eval, bs can be set to 1, and inference_per_block can be set to False.
seq_len: 2048

quant: method: Awq weight: bit: 4 symmetric: False granularity: per_channel group_size: -1 calib_algo: learnable act: bit: 4 symmetric: False granularity: per_token calib_algo: minmax special: trans: True trans_version: v2 weight_clip: True clip_version: v2 save_scale: True scale_path: ./save/Meta-Llama-3.1-8B-Instruct_awq_w4a4_scale save_clip: True clip_path: ./save/Meta-Llama-3.1-8B-Instruct_awq_w4a4_clip save: save_trans: True save_quant: False save_path: ./save/Meta-Llama-3.1-8B-Instruct_awq_w4a4

Harahan commented 4 weeks ago

We will fix this later.

gushiqiao commented 3 weeks ago

This seems to be a bug in the transformers. This can be solved by adding this sentence "inv_freq_expanded=inv_freq_expanded.to(x.device)" before line 153 in the modeling_llama.py file of the transformers you installed. By the way, the execution of llama3.1 necessitates the use of a newer version of transformers, which may unfortunately result in encountering this particular bug. If your requirement is to run llama1, llama2, or llama3, simply downgrading the transformers version should suffice to avert this issue.

ModelTC / llmc

llama3.1-70b awq_w4a4 error #25

For 70B model eval, bs can be set to 20, and inference_per_block can be set to True.