How to quantize an inherited linear layer?

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

MIT License

4.14k stars 425 forks source link

How to quantize an inherited linear layer? #704

Open nzomi opened 2 weeks ago

nzomi commented 2 weeks ago

As I understand it, we use the make_quant function to replace all specified linear layers with QuantLinear. However, if our linear layer is inherited from a class (e.g., super().forward() calling nn.Linear), how can we quantize this inherited linear layer?

class module(nn.Linear):
    def __init__(self)
        super().__init__()

    def forward(self, x):
        res = super().forward(x)      
        return res

nzomi commented 2 weeks ago

Currently, I replace the weight key to match a new module that contains a standard linear layer, but I'm looking for a more general solution.