Some confusion about least squares

WoosukKwon / retraining-free-pruning

[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers

173 stars 27 forks source link

Some confusion about least squares #18

Open TianL123 opened 5 months ago

TianL123 commented 5 months ago

Hello, I'm hoping you can help me understand why is the dimension of A TDN, TDH, since the dimension of hidden_states is TD1, it becomes TD after dense.
ATA += W * (hidden_states @ hidden_states.t() ) #1956,1956 Why did ATA calculate it this way？ Can you recommend materials to solve this problem？ Looking forward to your reply.

liuxiaozhu01 commented 2 months ago

Same confused