Question for MaskLLM Paper

Hi @guanchenl, sorry for the late response.

A1: Ah.. I’m not very familiar with CoFi, but from what I understand, it is a regularization-based method. One potential challenge could be in controlling sparsity, particularly achieving configurations like having only 2 non-zero values out of 4 weights.

A2: In MaskLLM, the LLM weights are frozen during the mask learning process. However, it is still OK to compare MaskLLM with fine-tuning methods like SPP. For instance, SPP combined with Wanda yields an accuracy of 50.61% on HellaSwag, while MaskLLM achieves 50.91% without fine-tuning. If feasible, we will provide more fine-tuning results in the future.

Thanks!

NVlabs / MaskLLM

Question for MaskLLM Paper #2