Mismatch in vocab_size between .bin files and .safetensors files

Hey !

I'm sorry if this is not an issue and it's just me not understanding the problem, I'm not an expert, rather a novice, in this field.

I'm trying to deploy the project according to your deployment guide. However, since I don't have access to enough memory for the -70B version of the model, I want to use the --load-8bit parameter to enable model compression. (I shall specify that I run the model using the CPU, with the --device cpu flag)

When I use this, I get the following error:

ValueError: Trying to set a tensor of shape torch.Size([32000, 8192]) in "weight" (which has shape torch.Size([32017, 8192])), this look incorrect

If I look in the HF's upload log, I see that there were two main upload of the model:

The first one with the .bin files, including the vocab_size value set to 32000
The seconde one with the .safetensors files, including the vocab_size value set to 32017

My understanding is that to enable model compression, the .bin files are needed, which do not match to the model configuration anymore.

This is supported by a manual edit of the config.json file to set vocab_size back to 32000, which allows the model to load properly using --load-8bit.

epfLLM / meditron

Mismatch in vocab_size between .bin files and .safetensors files #43