新增的transformer层是与上一层共享参数吗？

TencentARC / LLaMA-Pro

[ACL 2024] Progressive LLaMA with Block Expansion.

https://tencentarc.github.io/LLaMA-Pro/

Apache License 2.0

482 stars 35 forks source link

Closed CharlinChen closed 8 months ago

CharlinChen commented 8 months ago

请问在此处： https://github.com/TencentARC/LLaMA-Pro/blob/b8fbe8c764edc80124a6c3e5062360ab9f7543d9/scripts/block_expansion.py#L35 直接将上一层参数共享给了下一层，保存为huggingface的.safetensors文件时会提示layer.3.x和layer.4.x参数是共享的。此处是否应该为：

output[k.replace(('layers.' + str(i) + '.'), ('layers.' + str(layer_cnt) + '.'))] = ckpt[k].clone()

PenutChen commented 8 months ago

做了一點測試，在 torch.save 裡面確實會用同一個指標來存該層權重，但是用 Hugging Face Transformers 讀取出來之後，又變成指標相異的權重了

hills-code commented 8 months ago

做了一點測試，在 torch.save 裡面確實會用同一個指標來存該層權重，但是用 Hugging Face Transformers 讀取出來之後，又變成指標相異的權重了

是的，我们torch.save为.bin文件后用huggingface transformers进行load的时候并不会让这两层的参数共享，我没有试过保存为.safetensors文件

PenutChen commented 8 months ago

@hills-code 不過我也覺得加上 .clone() 會比較符合論文的描述，即便在 Hugging Face 裡面的行為是正確的