arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.87k stars 444 forks source link

Why are the names of parameters hard-coded? Is it possible to read it from index.json in HF checkpoints? #460

Open zhangzx-uiuc opened 4 days ago

zhangzx-uiuc commented 4 days ago

Hi! Thanks much for developing this tool for model merging!

It seems that the tensor names are hardcoded in https://github.com/arcee-ai/mergekit/tree/main/mergekit/_data/architectures (for Mixtral it is defined in https://github.com/arcee-ai/mergekit/blob/main/mergekit/architecture.py#L282), and a function get_architecture_info (https://github.com/arcee-ai/mergekit/blob/57e7d14e2a732f532970e2c9dada00e2d8f15a7a/mergekit/architecture.py#L358) is used to look for these parameter names.

Just wondering can we directly read the parameter metadata in "pytorch_model.bin.index.json" or "model.safetensors.index.json"? Otherwise we cannot merged model from our own customized model architecture.

Thanks!

ElliotStein commented 1 day ago

Have a look at the architecture-agnostic branch, this is WIP but should work for you! By the way, it'll work much more efficiently if you have the model stored with safetensors than pytorch bin.