TencentARC / LLaMA-Pro

[ACL 2024] Progressive LLaMA with Block Expansion.
https://tencentarc.github.io/LLaMA-Pro/
Apache License 2.0
449 stars 34 forks source link

Issue with Model Saving After Layer Expansion: Removed Shared Tensors #12

Closed yumingfan-0219 closed 5 months ago

yumingfan-0219 commented 5 months ago

I encountered an issue while saving the model after expanding layers.

The message indicated "Removed shared tensor {'model.layers.88.post_attention_layernorm.weight', 'model.layers.88.self_attn.k_proj.weight', '.......', 'model.layers.83.input_layernorm.weight', 'model.layers.82.input_layernorm.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading."

I'm unsure if this is caused by directly copying layers?

hills-code commented 5 months ago

I did not encounter this issue before.

Here is my expansion pipeline. First, use the script to get an expanded pytorch_model.bin checkpoint. Move the checkpoint to the folder that contains config.json, tokenizer, and so on. This folder can be loaded through the huggingface transformers directly. You may need to modify the number of layers in the config.json to match the expanded layers.

yumingfan-0219 commented 5 months ago

I've figured it out; this is caused by saving the safetensor. Saving as multiple split bins won't have this issue.