Why do we need to add 1 to the vocab_size when constructing the model?

HuangLK / transpeeder

train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism

Apache License 2.0

208 stars 18 forks source link

Open forceshorty opened 1 year ago

forceshorty commented 1 year ago

HuangLK commented 1 year ago

forceshorty commented 1 year ago

Thank you for your answer. There is another question: why was not vocab_size increased by 1 in the convert2hf.py script, the original vocab_size is being used? https://github.com/HuangLK/llama-deepspeed/blob/faedea514b11c18c695e1b2a6adb63b102ef001c/scripts/convert2hf.py#L43