Why transpose the input when in case of nn.Linear or nn.Conv1d?

IST-DASLab / sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

https://arxiv.org/abs/2301.00774

Apache License 2.0

731 stars 97 forks source link

Closed tada0347 closed 7 months ago

tada0347 commented 8 months ago

In sparsegpt.py at def add_batch line 42: inp = inp.t()

This code makes the hessian matrix into X^TX rather than XX^T when pruning nn.Linear or nn.Conv1d

Why did you transpose these inputs??

Are there any missings that I don't understand?