Instead of ratio-compress to specific layer size

AIoT-MLSys-Lab / SVD-LLM

Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"

Apache License 2.0

68 stars 6 forks source link

Hi! Yes, SVD-LLM can compress the model(tensor) to a smaller size. But it does not change the shape of input and output channel. For example, the algorithm replaces the original weight matrix W with shape 128x128 with the multiplication of two smaller matrix A: 128x16 and B: 16x128. In this way, the number of stored values will be reduced from 128x128=16384 to 128x16x2=4096.

Changing the shape of input and output channel - from 128x128 to 64x64 is another kind of compression method. It could be achieved by structured pruning. You can refer this paper for more detail: https://arxiv.org/abs/2305.11627

AIoT-MLSys-Lab / SVD-LLM

Instead of ratio-compress to specific layer size #4