As I understand, in the 4-bit example demo, we quantize a model by replacing an original linear layer with a quantized layer using the recurse_setattrfunction in make_quant provided by autogpt. However, when I checked the VL-7B model and the 4-bit model, I found that none of the linear layers were replaced; instead, the quantized layer was inserted into the LoRA module. How did you achieve that?
As I understand, in the 4-bit example demo, we quantize a model by replacing an original linear layer with a quantized layer using the![image](https://github.com/InternLM/InternLM-XComposer/assets/91032945/e48e7c25-e06f-4f74-bc93-a1ca03815a1b)
recurse_setattr
function inmake_quant
provided byautogpt
. However, when I checked the VL-7B model and the 4-bit model, I found that none of the linear layers were replaced; instead, the quantized layer was inserted into the LoRA module. How did you achieve that?