Prior to this PR, our checkpointing code only supports loading .pth files but - in support of Mixtral 8x22B - we had need to load a safetensors file (v0.3 of the Instruct checkpoint, published by Mistral). We additionally noted that - as with .pth files - safetensors checkpoints could be split across multiple files. This PR addresses both cases.
Note we also introduce a new required command line parameter: checkpoint-type, which can take the value pth or safetensors.
Finally, a couple of minor fixes:
Set CPU count to 1 to stop some errors we otherwise see
Check that vocab size is correct instead of assuming so and slicing to it
Prior to this PR, our checkpointing code only supports loading
.pth
files but - in support of Mixtral 8x22B - we had need to load asafetensors
file (v0.3 of the Instruct checkpoint, published by Mistral). We additionally noted that - as with.pth
files - safetensors checkpoints could be split across multiple files. This PR addresses both cases.Note we also introduce a new required command line parameter: checkpoint-type, which can take the value
pth
orsafetensors
.Finally, a couple of minor fixes: