Hello, I just download the files of dbrx-instruct from huggingface. But When I run the example code I just stuck the message "Setting pad_token_id to eos_token_id:100257 for open-end generation.". Is that the problem of memory?
The code is :
"""
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
Hello, I just download the files of dbrx-instruct from huggingface. But When I run the example code I just stuck the message "Setting
pad_token_id
toeos_token_id
:100257 for open-end generation.". Is that the problem of memory?The code is : """ from transformers import AutoTokenizer, AutoModelForCausalLM import torch
tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
input_text = "What does it take to build a great LLM?" messages = [{"role": "user", "content": input_text}] input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
print(input_ids)
outputs = model.generate(**input_ids, max_new_tokens=200) print(tokenizer.decode(outputs[0]))
"""
The basic configure of my local host is: Geforce 3090 Memroy 32G CPU 24cores.
Please Help.