Closed lilhongxy closed 8 months ago
This error is usually caused by calling replace_with_chunkllama()
after model.from_pretrained()
. Make sure replace_with_chunkllama()
is called before initializing the model. If it cannot solve the error, please provide more details.
from transformers import AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM
from chunkllama_attn_replace import replace_with_chunkllama
import torch
replace_with_chunkllama(pretraining_length=4096)
tokenizer = LlamaTokenizer.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True, torch_dtype=torch.bfloat16)
inputs = tokenizer("Long...docs\n Q: How to extend the context window of LLMs? ", return_tensors="pt")
output_ids = model.generate(**inputs, max_length=128)[0]
print(tokenizer.decode(output_ids))
I just precisely followde the inference instructions,but the issue remained...
from transformers import AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM from chunkllama_attn_replace import replace_with_chunkllama import torch replace_with_chunkllama(pretraining_length=4096) tokenizer = LlamaTokenizer.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True) model = LlamaForCausalLM.from_pretrained("path_to_Llama-2-7b-hf", trust_remote_code=True, torch_dtype=torch.bfloat16) inputs = tokenizer("Long...docs\n Q: How to extend the context window of LLMs? ", return_tensors="pt") output_ids = model.generate(**inputs, max_length=128)[0] print(tokenizer.decode(output_ids))
I just precisely followde the inference instructions,but the issue remained...
Could you please check your transformers version? RoPE api for llama is changed again after 4.38. (Actually, It always changes...from 4.35 to 4.36, to 4.37, to 4.38 ... almost each recent transformers release has a new RoPE implementation for Llama..😓)
Hey guys, the code works in my environment. My transformer version is 4.37.2
from transformers import AutoTokenizer, LlamaTokenizer, LlamaForCausalLM, AutoModelForCausalLM
from chunkllama_github import replace_with_chunkllama
import torch
model_path = "path/to/llama2"
replace_with_chunkllama(pretraining_length=4096)
tokenizer = LlamaTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(model_path, trust_remote_code=True, torch_dtype=torch.bfloat16).to("cuda:0")
inputs = tokenizer("Long...docs\n Q: How to extend the context window of LLMs? ", return_tensors="pt").to("cuda:0")
output_ids = model.generate(**inputs, max_length=128)[0]
print(tokenizer.decode(output_ids))
Please use Flash Attention for processing longer input:
model = LlamaForCausalLM.from_pretrained(model_path, attn_implementation="flash_attention_2", trust_remote_code=True, torch_dtype=torch.bfloat16).to(device)
thank you all guys !!!😄 finally get success It is my torch version that caused the issue, previous version is 2.2.1+cu118
success environment:
torch_version:2.0.1+cu118
transformers_version:4.37.2
If there are no further questions or follow-up discussions, I will close this issue shortly. Thank you all for your contributions and participation.
infer is corret, but when finetune, it comes out again
_cos, sin = self.rotary_emb(value_states, seq_len=kv_seqlen) ValueError: too many values to unpack (expected 2)
I follow the instructions in the Full inference code,bu then I encounter this issue. How can I fix this?