THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.7k stars 1.85k forks source link

how to transfer chatglm2-6b int4 model to npu device #649

Open woaipichuli opened 9 months ago

woaipichuli commented 9 months ago

Is there an existing issue for this?

Current Behavior

I found code as below in quantize.py, it seems like the quantization_code only support running on GPU. Is there any suggestion to deploy the model on NPU. Is it possible that you can provide the code in the quantization_code and maybe I can rewrite it to support running on a NPU device.

class Kernel:
    def __init__(self, code: bytes, function_names: List[str]):
        self.code = code
        self._function_names = function_names
        self._cmodule = LazyKernelCModule(self.code)

        for name in self._function_names:
            setattr(self, name, KernelFunction(self._cmodule, name))

quantization_code = "XXXX"

kernels = Kernel(
    bz2.decompress(base64.b64decode(quantization_code)),
    [
        "int4WeightCompression",
        "int4WeightExtractionFloat",
        "int4WeightExtractionHalf",
        "int8WeightExtractionFloat",
        "int8WeightExtractionHalf",
    ],
)