dandelionsllm / pandallm

Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。
Apache License 2.0
1.06k stars 91 forks source link

response prefixes and renewable energy #14

Closed stakodiak closed 1 year ago

stakodiak commented 1 year ago

Hello,

I have two issues that I cannot seem to root out.

1) Panda sometimes adds "Human: ... Assistant: ..." dialogue after a response. I thought it was an issue with the tokenizer, so I tried the two different HF base models, as well as the original model from Meta - all the same.

Here's my code for inferencing:

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
    model_path, low_cpu_mem_usage=True, **kwargs
)

prompt = generate_prompt(messages)
input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')

with torch.no_grad():
    output_ids = model.generate(
        input_ids=input_ids,
        max_new_tokens=128,
        temperature=1,
        top_k=40,
        top_p=0.9,
        repetition_penalty=1.15
    ).cuda()
output = tokenizer.decode(output_ids[0], skip_special_tokens=True)

2) Panda sometimes starts talking about renewable energy out of nowhere. Do you know why this might be? Here are two screenshots showing both issues.

Screenshot 2023-05-08 at 4 34 18 PM Screenshot 2023-05-08 at 4 57 27 PM

These were in response to "你好"

SparkJiao commented 1 year ago

Hi,

  1. I'm not sure why the specific prefix will appear in the response. I didn't add the prefix during instruction tuning stage. This maybe relevant of the nature of LLaMA, which didn't use <eos> token during pre-training I think. So sometimes the model cannot stop by itself. We have appended <eos> token during instruction tuning stage but it seems that the problem still exists. When answering multiple choice questions, it often repeats the options given in the input after predicting the answer. So the specific prefix Human: ... Assistant: ... maybe accident based on the context.
  2. To be honest, I'm really suprised that it can generate such long response given only two characters "你好". In my primary experiments, it can hardly generate English content when given Chinese instruction/prompt. I would suggest to check the detail in the method generate_prompt to see if some system instruction in English is used. For the topic information, it's just randomness I think.