epfLLM / Megatron-LLM

distributed trainer for LLMs
Other
529 stars 76 forks source link

Make llama2 vocab size divisible by 128 by default #53

Closed AleHD closed 1 year ago

AleHD commented 1 year ago

As suggested by @andreaskoepf, we should set the vocab size to be divisible by 128 by default, unless there is a good reason not to. This commit fixes this. Moreover, it was locally verified running tests/test_llama_weights.py, so there will not be any issues when converting the weights meta -> megatron -> shard -> unshard -> huggingface.

Any other change I should consider before merging @andreaskoepf? Maybe something in the megatron -> huggingface for larger models?

andreaskoepf commented 1 year ago

@AleHD if possible merge codellama change first, since this will create an conflict ;-)