Closed mrmuke closed 12 months ago
Example: "model correct output... <|endoftext|>In this task, you are given a sentence in the English language and your task is to convert it into the Japanese language. In translation, keep numbers as it is and make it sentence case (capitalize only the first word of each sentence and noun). The first hostage was release", I am adding special tokens via: "tokenizer.add_special_tokens({ "eos_token": tokenizer.convert_ids_to_tokens(model.config.eos_token_id), "bos_token": tokenizer.convert_ids_to_tokens(model.config.bos_token_id), "unk_token": tokenizer.convert_ids_to_tokens( model.config.pad_token_id if model.config.pad_token_id != -1 else tokenizer.pad_token_id ), })"
The end of sequence token for yi 34b is not being added to training since the model continues to generate past the EOS token <|endoftext|> after finetuning.