ModelCloud / GPTQModel

Apache License 2.0
21 stars 8 forks source link

[FEATURE] Transformers Integration #91

Closed Qubitium closed 2 hours ago

Qubitium commented 5 days ago
  1. Short-term. We need to monkeypatch Transformers so AutoModelForCasual.from_pretrained() hook to AutoGPTQ is routed to GPTQModel instead.

    For monkey patch there are two paths:

    1. Directly monkey patch Transformer code.
    2. Monkey patch AutoGPTQ.from_quantized() class method so it is routed to GPTQModel.from_quantized() instead when Transformers does the hook call.
  2. Mid-term. We should also submit PR to Transformers so the quant (AutoGPTQ) integration is a dynamic hook, not a static bound to any pkg. For this to happen, we need to design a shared generic api/hook structure so that GPTQModel and AutoGPTQ can co-exist. in-addition to any future quant packages that would want to hook into the loader/inference.

Target: v0.9.2

Qubitium commented 3 days ago

Milestone target changed to v0.9.3