HKUNLP / ChunkLlama

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
Apache License 2.0
362 stars 19 forks source link

a confusing issue #8

Closed lilhongxy closed 8 months ago

lilhongxy commented 8 months ago

_cos, sin = self.rotary_emb(value_states, seq_len=kv_seqlen) ValueError: too many values to unpack (expected 2)

I follow the instructions in the Full inference code,bu then I encounter this issue. How can I fix this?

ChenxinAn-fdu commented 8 months ago

This error is usually caused by calling replace_with_chunkllama() after model.from_pretrained(). Make sure replace_with_chunkllama() is called before initializing the model. If it cannot solve the error, please provide more details.

lilhongxy commented 8 months ago
from transformers import AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM
from chunkllama_attn_replace import replace_with_chunkllama
import torch

replace_with_chunkllama(pretraining_length=4096)

tokenizer = LlamaTokenizer.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True, torch_dtype=torch.bfloat16)
inputs = tokenizer("Long...docs\n Q: How to extend the context window of LLMs? ", return_tensors="pt")

output_ids = model.generate(**inputs, max_length=128)[0]
print(tokenizer.decode(output_ids))

I just precisely followde the inference instructions,but the issue remained...

Mooler0410 commented 8 months ago
from transformers import AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM
from chunkllama_attn_replace import replace_with_chunkllama
import torch

replace_with_chunkllama(pretraining_length=4096)

tokenizer = LlamaTokenizer.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True, torch_dtype=torch.bfloat16)
inputs = tokenizer("Long...docs\n Q: How to extend the context window of LLMs? ", return_tensors="pt")

output_ids = model.generate(**inputs, max_length=128)[0]
print(tokenizer.decode(output_ids))

I just precisely followde the inference instructions,but the issue remained...

Could you please check your transformers version? RoPE api for llama is changed again after 4.38. (Actually, It always changes...from 4.35 to 4.36, to 4.37, to 4.38 ... almost each recent transformers release has a new RoPE implementation for Llama..😓)

ChenxinAn-fdu commented 8 months ago

Hey guys, the code works in my environment. My transformer version is 4.37.2

from transformers import AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM
from chunkllama_github import replace_with_chunkllama
import torch

model_path = "path/to/llama2"

replace_with_chunkllama(pretraining_length=4096)

tokenizer = LlamaTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(model_path, trust_remote_code=True, torch_dtype=torch.bfloat16).to("cuda:0")
inputs = tokenizer("Long...docs\n Q: How to extend the context window of LLMs? ", return_tensors="pt").to("cuda:0")

output_ids = model.generate(**inputs, max_length=128)[0]
print(tokenizer.decode(output_ids))
ChenxinAn-fdu commented 8 months ago

Please use Flash Attention for processing longer input: model = LlamaForCausalLM.from_pretrained(model_path, attn_implementation="flash_attention_2", trust_remote_code=True, torch_dtype=torch.bfloat16).to(device)

lilhongxy commented 8 months ago

thank you all guys !!!😄 finally get success It is my torch version that caused the issue, previous version is 2.2.1+cu118

success environment: torch_version:2.0.1+cu118 transformers_version:4.37.2

ChenxinAn-fdu commented 8 months ago

If there are no further questions or follow-up discussions, I will close this issue shortly. Thank you all for your contributions and participation.

MarsMeng1994 commented 7 months ago

infer is corret, but when finetune, it comes out again