CASE-Lab-UMD / LLM-Drop

The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
Apache License 2.0
146 stars 15 forks source link

Implementing this on other models apart from LLaMa and Mistral #7

Closed Acedev003 closed 2 weeks ago

Acedev003 commented 3 weeks ago

Is it possible to implement this for other models from huggingface apart from LLaMa and Mistral?

s1ghhh commented 3 weeks ago

Thank you for your interest in our project. We will soon release modeling files that support more models. If you would like to make modifications to support specific models yourself, the following information may be helpful. You can refer to the following sections:

In L139-L187 of configuration file: This part reads lists from the config.json file to identify which layers’ attention or MLP components should be dropped. In L505-L518 of modeling file: We selectively load weights based on whether certain components are dropped. In L551-L574 of modeling file: During inference, we bypass certain dropped modules.

Acedev003 commented 3 weeks ago

Thanks for sharing these :) , will have a look into it...

Shwai-He commented 2 weeks ago

Hi,

We have updated the code to support additional language models, including Gemma2, Baichuan, DeepSeek, Yi, Solar. Yi and Solar refer to Llama, and we checked their availability.

We hope these updates bring you added convenience. We will continue to enhance this code to support more models in the future.