efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
259 stars 21 forks source link

TypeError: QLlamaDecoderLayer.forward() got an unexpected keyword argument 'cache_position' #19

Closed galenyu closed 2 months ago

galenyu commented 2 months ago

Hi there, I followed all the steps of this proj until I encountered an issue while running the following command. python model/main.py decapoda-research-llama-7b-hf wikitext2 \ --wbits 4 --abits 4 --a_sym --w_sym \ --act_group_size 128 --weight_group_size 128 --weight_channel_group 2 \ --reorder --act_sort_metric hessian \ --a_clip_ratio 0.9 --w_clip_ratio 0.85 \ --keeper 128 --keeper_precision 3 --kv_cache --use_gptq \ --eval_ppl --eval_common_sense

Env

  1. GPU: Nvidia RTX 4090
  2. As same as you, I used the container of nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04
  3. The versions of Python and dependency libraries are consistent with yours requirements.txt
  4. For convenient, I temporarily skipped the step of Compiling kernels benchmarks
  5. By the way, for quickly validation, I changed this line to only verify wikitext2 dataset_only_wikitext2

Describe the issue

When running loglikelihood requests, The typeError in the screenshot has occurred:

image

I tried to make changes based on this issue for cache_position=None in transformers/models/llama/modeling_llama.py, but it also doesn't work.

Any suggestions will be greatly appreciated!

happierpig commented 2 months ago

Hi @galenyu ,

Thanks for pointing out this issue and the info you provided! This problem was introduced by upgrading Transformers version in previous commit (11bae09). We have fixed this error in 7e3618b. Please let me know if the issue still exists.

galenyu commented 2 months ago

Hi @galenyu ,

Thanks for pointing out this issue and the info you provided! This problem was introduced by upgrading Transformers version in previous commit (11bae09). We have fixed this error in 7e3618b. Please let me know if the issue still exists.

the version indeed causes this issue and now it works, thank you!