bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.33k stars 215 forks source link

add `pad-vocab-size-to` argument and tests #255

Closed SaulLu closed 2 years ago

SaulLu commented 2 years ago

Add a pad-vocab-size-to argument in order to let the possibility to the user to specify the wanted tokenizer vocabulary size - and not use the automatic feature computing a vocab size compatible with make_vocab_size_divisible_by and the tensor Parallelism value.

We also took advantage of this new feature to add tests and an assert in the code that verifies that the input ids cannot be outside the admitted input ids.

DanielHesslow commented 2 years ago

Looks good to me. At some point we should probably factor out all of the pool/process launch stuff to a common place but that's for when things have calmed down a bit