@richardllin @panyx0718 @Imccccc Hi all, could you please give some advice for this issue?
Does Yi-1.5-Chat model use the standard CHATML template? Is the bos_token <|im_start|> or <|startoftext|>? Is the eos_token <|im_end|> or <|endoftext|>?
Yi-1.5-34B-Chat-16K/config.json is not consistent with Yi-1.5-34B-Chat-16K/tokenizer_config.json.
When model generating or training, will the bos_token be added at the front of prompt?
As shown in Yi-1.5-34B-Chat-16K/config.json:
"bos_token_id": 1,
"eos_token_id": 2,
As shown in Yi-1.5-34B-Chat-16K/tokenizer_config.json:
Using standard chatml templates, bos_token and eos_token mainly depend on the tokenizer_config.json file and are not related to config.json. Besides, during SFT, |im_start| will be added by default by template,such as here
@richardllin @panyx0718 @Imccccc Hi all, could you please give some advice for this issue? Does Yi-1.5-Chat model use the standard CHATML template? Is the bos_token <|im_start|> or <|startoftext|>? Is the eos_token <|im_end|> or <|endoftext|>? Yi-1.5-34B-Chat-16K/config.json is not consistent with Yi-1.5-34B-Chat-16K/tokenizer_config.json. When model generating or training, will the bos_token be added at the front of prompt?
As shown in Yi-1.5-34B-Chat-16K/config.json:
As shown in Yi-1.5-34B-Chat-16K/tokenizer_config.json: