Hi, when running the code in the readme, I find the cpu memory useage is much higher than expected. After reading the code, I found that the __init__ function of the QuantLinear() class in the auto_gptq was not being overridden as it should be. Therefore, I add the following code in the model_offload.py and fix the problem.
Hi, when running the code in the readme, I find the cpu memory useage is much higher than expected. After reading the code, I found that the
__init__
function of theQuantLinear()
class in theauto_gptq
was not being overridden as it should be. Therefore, I add the following code in themodel_offload.py
and fix the problem.