deepseek-ai / DeepSeek-Coder-V2

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
MIT License
2.18k stars 107 forks source link

mismatch between example code and model files #25

Open yucc-leon opened 4 months ago

yucc-leon commented 4 months ago

I found in this repo and the huggingface model card there is a line:

# tokenizer.eos_token_id is the id of <|EOT|> token

But in tokenizer_config.py inside model repo the eos_token is set to be <|end▁of▁sentence|>:

"eos_token": {
    "__type": "AddedToken",
    "content": "<|end▁of▁sentence|>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  }

which is correct?

guoday commented 4 months ago

<|end▁of▁sentence|>