Nota-NetsPresso / shortened-llm

Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]
63 stars 8 forks source link

How can this method be applied to other large language models? #8

Closed Franklin-L closed 4 months ago

Franklin-L commented 5 months ago

I would like to apply this pruning method to other large models. How can I do that?

bokyeong1015 commented 4 months ago

Hi, this repo supports Hugging Face Transformer-based LLMs by implementing block pruning as weight copy (i.e., defining a pruned architecture and copying unpruned weights into the architecture).

We would recommend changing the Hugging Face model name in our example scripts and fixing bugs if needed.