Make llama2 vocab size divisible by 128 by default

As suggested by @andreaskoepf, we should set the vocab size to be divisible by 128 by default, unless there is a good reason not to. This commit fixes this. Moreover, it was locally verified running tests/test_llama_weights.py, so there will not be any issues when converting the weights meta -> megatron -> shard -> unshard -> huggingface.

Any other change I should consider before merging @andreaskoepf? Maybe something in the megatron -> huggingface for larger models?

epfLLM / Megatron-LLM

Make llama2 vocab size divisible by 128 by default #53