WoosukKwon / retraining-free-pruning

[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
https://arxiv.org/abs/2204.09656
173 stars 27 forks source link

Some confusion about least squares #18

Open TianL123 opened 5 months ago

TianL123 commented 5 months ago

Hello, I'm hoping you can help me understand why is the dimension of A TDN, TDH, since the dimension of hidden_states is TD1, it becomes TD after dense.
ATA += W * (hidden_states @ hidden_states.t() ) #1956,1956 Why did ATA calculate it this way? Can you recommend materials to solve this problem? Looking forward to your reply.

liuxiaozhu01 commented 2 months ago

Same confused