Closed Franklin-L closed 4 months ago
Hi, this repo supports Hugging Face Transformer-based LLMs by implementing block pruning as weight copy (i.e., defining a pruned architecture and copying unpruned weights into the architecture).
We would recommend changing the Hugging Face model name in our example scripts and fixing bugs if needed.
I would like to apply this pruning method to other large models. How can I do that?