01-ai / Yi-1.5

Yi-1.5 is an upgraded version of Yi, delivering stronger performance in coding, math, reasoning, and instruction-following capability.
Apache License 2.0
497 stars 28 forks source link

Does Yi-1.5-Chat model use the standard CHATML template? #23

Open songkq opened 4 months ago

songkq commented 4 months ago

@richardllin @panyx0718 @Imccccc Hi all, could you please give some advice for this issue? Does Yi-1.5-Chat model use the standard CHATML template? Is the bos_token <|im_start|> or <|startoftext|>? Is the eos_token <|im_end|> or <|endoftext|>? Yi-1.5-34B-Chat-16K/config.json is not consistent with Yi-1.5-34B-Chat-16K/tokenizer_config.json. When model generating or training, will the bos_token be added at the front of prompt?

As shown in Yi-1.5-34B-Chat-16K/config.json:

"bos_token_id": 1,
"eos_token_id": 2,

As shown in Yi-1.5-34B-Chat-16K/tokenizer_config.json:

"bos_token": "<|startoftext|>",
"eos_token": "<|im_end|>",

"1": {
"content": "<|startoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"7": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
Yimi81 commented 4 months ago

Using standard chatml templates, bos_token and eos_token mainly depend on the tokenizer_config.json file and are not related to config.json. Besides, during SFT, |im_start| will be added by default by template,such as here