IST-DASLab / sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
https://arxiv.org/abs/2301.00774
Apache License 2.0
731 stars 97 forks source link

Why transpose the input when in case of nn.Linear or nn.Conv1d? #33

Closed tada0347 closed 7 months ago

tada0347 commented 8 months ago

In sparsegpt.py at def add_batch line 42: inp = inp.t()

This code makes the hessian matrix into X^TX rather than XX^T when pruning nn.Linear or nn.Conv1d

Why did you transpose these inputs??

Are there any missings that I don't understand?