Inference with SFT and Policy EN models

Hello, I am trying to do some basic inference with your sft and policy models. However, when I instanciate the model directly with LlamaForCausalLM, the generation works well for the base pretrained LLama. But the sft model outputs nothing and the policy model outputs random tokens.

Could you help me with that? :) Thanks in advance!

from transformers import AutoTokenizer, LlamaForCausalLM

# base Llama 1
model_name_or_path1 = 'baffo32/decapoda-research-llama-7B-hf'
tokenizer_name_or_path = '/nas/ucb/henrypapadatos/MOSS-RLHF/models/moss-rlhf-policy-model-7B-en'
model1 = LlamaForCausalLM.from_pretrained(model_name_or_path1,device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path, padding_side='left')

prompt = "Hey, are you conscious? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt").to(device=cuda)

generate_ids = model1.generate(inputs.input_ids, max_length=50)
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(output)

Output: Hey, are you conscious? Can you talk to me? I'm not sure if you're conscious, but I'm going to assume you are. I'm not sure if you're conscious, but I'

#sft model 
model_name_or_path2 = '/nas/ucb/henrypapadatos/MOSS-RLHF/models/moss-rlhf-sft-model-7B-en/recover'
model2 = LlamaForCausalLM.from_pretrained(model_name_or_path2,device_map="auto")

generate_ids = model2.generate(inputs.input_ids, max_length=(50))
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(output)

Output: Hey, are you conscious? Can you talk to me?

#policy model 
model_name_or_path3 = '/nas/ucb/henrypapadatos/MOSS-RLHF/models/moss-rlhf-policy-model-7B-en/recover'
model3 = LlamaForCausalLM.from_pretrained(model_name_or_path3,device_map="auto")

generate_ids = model3.generate(inputs.input_ids, max_length=50)
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(output)

Output: Hey, are you conscious? Can you talk to me?lapsedmodниципамина� deploymentclassesандфикаouses compat thereforezzachn乡 Hope WilliamHER forms problemunicí filmewissenschaft scopeASHERTстыunderline instrumentsполиAnalItalie essentialRegisterкраї traverse автор

OpenLMLab / MOSS-RLHF

Inference with SFT and Policy EN models #36