FMInference / DejaVu

268 stars 32 forks source link

Does anyone know if this work is implemented on llama? #18

Open wyxscir opened 7 months ago

wyxscir commented 7 months ago

Does anyone know if this work is implemented on llama? Or is there any similar dynamic pruning work on llama?

XieWeikai commented 6 months ago

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

wyxscir commented 6 months ago

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

thank you

wyxscir commented 6 months ago

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

“ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models”, What you mentioned maybe be this work