NVlabs / MaskLLM

[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
https://vainf.github.io/maskllm-project-page
Other
101 stars 11 forks source link

Question for MaskLLM Paper #2

Open guanchenl opened 2 weeks ago

guanchenl commented 2 weeks ago

Excellent work!

  1. Is it possible to achieve the Mask differentiable condition through CoFi?
  2. How does MaskLLM's pruning cost and post pruning accuracy compare to SAT, Post Pruning Fine-tuning, or Sparsity Aware PEFT?

Best

VainF commented 1 week ago

Hi @guanchenl, sorry for the late response.

A1: Ah.. I’m not very familiar with CoFi, but from what I understand, it is a regularization-based method. One potential challenge could be in controlling sparsity, particularly achieving configurations like having only 2 non-zero values out of 4 weights.

A2: In MaskLLM, the LLM weights are frozen during the mask learning process. However, it is still OK to compare MaskLLM with fine-tuning methods like SPP. For instance, SPP combined with Wanda yields an accuracy of 50.61% on HellaSwag, while MaskLLM achieves 50.91% without fine-tuning. If feasible, we will provide more fine-tuning results in the future.

Thanks!