AIoT-MLSys-Lab / SVD-LLM

Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"
https://arxiv.org/abs/2403.07378
Apache License 2.0
68 stars 6 forks source link

Instead of ratio-compress to specific layer size #4

Closed choprahetarth closed 3 months ago

choprahetarth commented 3 months ago

Hello! I was wondering if this code can be manipulated to transform a tensor - say (32128128) to a smaller tensor (86464). Basically reduce the size of the llm layer by layer.

tuidan commented 3 months ago

Hi! Yes, SVD-LLM can compress the model(tensor) to a smaller size. But it does not change the shape of input and output channel. For example, the algorithm replaces the original weight matrix W with shape 128x128 with the multiplication of two smaller matrix A: 128x16 and B: 16x128. In this way, the number of stored values will be reduced from 128x128=16384 to 128x16x2=4096.

Changing the shape of input and output channel - from 128x128 to 64x64 is another kind of compression method. It could be achieved by structured pruning. You can refer this paper for more detail: https://arxiv.org/abs/2305.11627