Open YLSnowy opened 7 months ago
As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.
As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.
get it! Thank you
As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.
I store the parameters of a single expert as a .safetensors file, but the file attribute is not data, but a lif file. May I ask how you split it into multiple safetensors files instead of loading the code of the parameter file?
As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.
I store the parameters of a single expert as a .safetensors file, but the file attribute is not data, but a lif file. May I ask how you split it into multiple safetensors files instead of loading the code of the parameter file?
What's a lif file?
As far as I remember, we have a different checkpoint structure from the original Mixtral model. For instance, we keep every expert in a separate file. This should lead to us having more files than the original checkpoint.
I store the parameters of a single expert as a .safetensors file, but the file attribute is not data, but a lif file. May I ask how you split it into multiple safetensors files instead of loading the code of the parameter file?
What's a lif file?
It represents a file that Linux cannot correctly recognize. I think there should be a problem with my splitting process. I found that this problem only exists in the division of expert parameters and does not exist in other parameters. I am looking through the source code of mixtral. I found that the parameters of decoderlayer were marked as _no_split_module, so I am more curious about how you split it into multiple files.
The original Mixtral model safetensors files have a total of 19, and the parameter file you provided has 257 safetensors. How do you split the model? Can you provide this part of the code?