dvmazur / mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops
MIT License
2.28k stars 223 forks source link

How to split the model parameter safetensors file into multiple small files #34

Open YLSnowy opened 3 months ago

YLSnowy commented 3 months ago

The original Mixtral model safetensors files have a total of 19, and the parameter file you provided has 257 safetensors. How do you split the model? Can you provide this part of the code?

dvmazur commented 3 months ago

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

YLSnowy commented 3 months ago

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

get it! Thank you

YLSnowy commented 3 months ago

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

I store the parameters of a single expert as a .safetensors file, but the file attribute is not data, but a lif file. May I ask how you split it into multiple safetensors files instead of loading the code of the parameter file?

dvmazur commented 3 months ago

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

I store the parameters of a single expert as a .safetensors file, but the file attribute is not data, but a lif file. May I ask how you split it into multiple safetensors files instead of loading the code of the parameter file?

What's a lif file?

YLSnowy commented 3 months ago

As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.

I store the parameters of a single expert as a .safetensors file, but the file attribute is not data, but a lif file. May I ask how you split it into multiple safetensors files instead of loading the code of the parameter file?

What's a lif file?

It represents a file that Linux cannot correctly recognize. I think there should be a problem with my splitting process. I found that this problem only exists in the division of expert parameters and does not exist in other parameters. I am looking through the source code of mixtral. I found that the parameters of decoderlayer were marked as _no_split_module, so I am more curious about how you split it into multiple files.