IndexError when running inference with Llama-2 model

shang-zhu commented 11 months ago

Hi thanks for this amazing work.

I followed the installation guide in this issue: #25. but it gives me the following error when running the inference code below on 2 V100 GPUs, each with 32GB:

python src/run_generation.py --model_type llama --model_name_or_path meta-llama/Llama-2-7b-chat-hf \
    --prefix "<s>[INST] <<SYS>>\n You are a helpful assistant. Answer with detailed responses according to the entire instruction or question. \n<</SYS>>\n\n Summarize the following book: " \
    --prompt example_inputs/harry_potter.txt \
    --suffix " [/INST]" --test_unlimiformer --fp16 --length 200 --layer_begin 16 \
    --index_devices 0 --datastore_device 0

Error:

File "/ocean/projects/cts180021p/shang9/foundation_models/openLLM4chem/unlimiformer/src/unlimiformer.py", line 1086, in preprocess_query
    cos = cos[:,:,-1]  # [1, 1, dim]
IndexError: too many indices for tensor of dimension 2

Do you know what may go wrong? Thanks.

urialon commented 11 months ago

Hi @shang-zhu , Thank you for your interest in our work!

What is your pytorch version and transformers version?

Best, Uri

shang-zhu commented 11 months ago

Thank you for your quick reply!

Here is my pytorch and transformers version:

torch                     2.1.0                    pypi_0    pypi
transformers              4.36.0.dev0              pypi_0    pypi

shang-zhu commented 11 months ago

I actually made it work with the following software version:

pytorch                   2.0.1           py3.11_cuda11.7_cudnn8.5.0_0    pytorch
transformers              4.31.0                   pypi_0    pypi

Thanks for the help!

abertsch72 / unlimiformer

IndexError when running inference with Llama-2 model #54