Closed KaiLv69 closed 5 months ago
prepare_chatml_messages 部分有一些 bug,这是我修改过的部分,供参考
prepared_messages = []
prepared_messages += [{"content": special_tokens_map['bos_token'], "require_loss": False}]
for message in messages['history']:
if message['role'] == "assistant":
prepared_messages += [{"content": '<|im_start|>' + message['role'] + '\n', "require_loss": False}]
prepared_messages += [{"content": message['content'] + '<|im_end|>', "require_loss": True}]
prepared_messages += [{"content": '\n', "require_loss": False}]
else:
prepared_messages += [
{"content": f"<|im_start|>{message['role']}\n{message['content']}<|im_end|>\n", "require_loss": False}]
if add_generation_prompt:
prepared_messages += [{"content": '<|im_start|>assistant\n', "require_loss": False}]
return prepared_messages
As title, add multi-turn dataset with template for training.