The output is the same as the input.

I have a strange problem. I want to use use the batch size data for unified prediction. So I first use the tokenizer.encode() to encode the data. and I also use the padding to control the uniform length. The specific code is shown below:

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
model.resize_token_embeddings(len(tokenizer))
code = '''Judge whether there is a defect in ["int ff_get_wav_header"]. Respond with 'YES' or 'NO' only.'''
system_prompt = """# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""
prompt = f"<|SYSTEM|>{system_prompt}<|USER|>{code}<|ASSISTANT|>"
source_ids = tokenizer.encode(prompt, max_length=1500, padding='max_length', return_tensors='pt').to("cuda")
source_mask = source_ids.ne(tokenizer.pad_token_id).long()
type_ids = torch.zeros(source_ids.shape, dtype=torch.long).to("cuda")
print(source_mask.shape, '---', source_ids.shape)
preds = model.generate(
        source_ids, attention_mask = source_mask,
        max_new_tokens=512,
        temperature=0.2,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
        stopping_criteria=StoppingCriteriaList([StopOnTokens()])
      )
top_preds = list(preds.cpu().numpy())
pred_nls = [tokenizer.decode(id, skip_special_tokens=True, clean_up_tokenization_spaces=False) for id in top_preds]
print(pred_nls)

However, the output is the same as the prompt, as follows:

['# StableLM Tuned (Alpha version)\n- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.\n- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.\n- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.\n- StableLM will refuse to participate in anything that could harm a human.\nJudge whether there is a defect in ["int ff_get_wav_header"]. Respond with \'YES\' or \'NO\' only.']

If I do the conversion without tokenizer.encode and direct use the tokenizer to make predict, I get the normal output, which the code looks like this:

code = '''Judge whether there is a defect in ["int ff_get_wav_header"]. Respond with 'YES' or 'NO' only.'''
system_prompt = """# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""
prompt = f"<|SYSTEM|>{system_prompt}<|USER|>{code}<|ASSISTANT|>"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        temperature=0.2,
        do_sample=True,
        stopping_criteria=StoppingCriteriaList([StopOnTokens()])
      )
print(tokenizer.decode(tokens[0], skip_special_tokens=True))```

The output is as follows:

# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
Judge whether there is a defect in ["int ff_get_wav_header"]. Respond with 'YES' or 'NO' only.**Yes**.

Why does this happen? Is there something important I'm missing

In addition, I just tried the following method, but the output is still the same as the input. The code as follows:

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
model.resize_token_embeddings(len(tokenizer))
code = '''Judge whether there is a defect in ["int ff_get_wav_header"]. Respond with 'YES' or 'NO' only.'''
system_prompt = """# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"<|SYSTEM|>{system_prompt}<|USER|>{code}<|ASSISTANT|>"
encoded_dict = tokenizer(prompt, max_length=1500, padding="max_length", truncation=True,return_tensors="pt").to("cuda")
print(encoded_dict['input_ids'].shape, '-----', encoded_dict["attention_mask"].shape)
example = {"input_ids":encoded_dict['input_ids'], "attention_mask":encoded_dict["attention_mask"]}
preds = model.generate(
        **example,
        max_new_tokens=512,
        temperature=0.2,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
        stopping_criteria=StoppingCriteriaList([StopOnTokens()])
      )
top_preds = list(preds.cpu().numpy())
pred_nls = [tokenizer.decode(id, skip_special_tokens=True, clean_up_tokenization_spaces=False) for id in top_preds]
print(pred_nls)

The output is:

['# StableLM Tuned (Alpha version)\n- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.\n- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.\n- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.\n- StableLM will refuse to participate in anything that could harm a human.\nJudge whether there is a defect in ["int ff_get_wav_header"]. Respond with \'YES\' or \'NO\' only.']

So how do I solve this problem?

Stability-AI / StableLM

The output is the same as the input. #81