Open wyxscir opened 7 months ago
This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.
This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.
thank you
This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.
“ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models”, What you mentioned maybe be this work
Does anyone know if this work is implemented on llama? Or is there any similar dynamic pruning work on llama?