codefuse-ai / MFTCoder

High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.
Other
631 stars 67 forks source link

请教4int的gptq模型能不能进行lora微调 #34

Closed wengyuan722 closed 5 months ago

wengyuan722 commented 10 months ago

我想对CodeFuse-CodeLlama-34B-4bits进行微调,先把模型读取改成AutoModelForCausalLM.from_pretrained 改了config.json配置,增加了quantization_config,改了模型文件名称, 在model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, device_map='auto', resume_download=True) 还是卡住了,这个怎么解决

twelveand0 commented 10 months ago

你好,int4模型是用auto-gptq量化得到的,不能用AutoModelForCausalLM.from_pretrained加载,可以这样加载

AutoGPTQForCausalLM.from_quantized(model_path, 
                                                inject_fused_attention=False,
                                                inject_fused_mlp=False,
                                                use_safetensors=True,
                                                use_cuda_fp16=True,
                                                disable_exllama=False,
                                                device_map='auto'   # 支持多卡
                                              )

以上是加载的方式,我们没有试过将INT4模型作为底座进行二次微调,如果是资源限制,你可以选择使用qlora + 4bit微调未量化的底座模型,在config.json文件中这样设置:

"peft_type": "qlora",
"quantization": "4bit",
wengyuan722 commented 10 months ago

我看千问的gptq模型是可以通过AutoModelForCausalLM.from_pretrained读取的,我参照改了config配置,还是卡住,正常来说千问的gptq可以微调,CodeFuse-CodeLlama-34B-4bits这个模型也可以

twelveand0 commented 10 months ago

我看千问的gptq模型是可以通过AutoModelForCausalLM.from_pretrained读取的,我参照改了config配置,还是卡住,正常来说千问的gptq可以微调,CodeFuse-CodeLlama-34B-4bits这个模型也可以

AutoGPTQForCausalLM.from_pretrained内部实现有调用AutoModelForCausalLM.from_pretrained(),除此之外,读取了一些GPTQ的量化config,可以参照此处代码AutoGTPQForCausalLM.from_pretrained实现load for training,用于训练需要把model.eval()去掉

wengyuan722 commented 10 months ago

@twelveand0 我调整成AutoModelForCausalLM.from_pretrained,

with ContextManagers(init_contexts): model = AutoModelForCausalLM.from_config( config, trust_remote_code=trust_remote_code, torch_dtype=torch_dtype, )

layers = find_layers(model)
ignore_layers = ['lm_head', 'model.embed_tokens', 'model.norm']
for name in list(layers.keys()):
    if any([name.startswith(ignore_layer) for ignore_layer in ignore_layers]):
        del layers[name]

make_quant(
    model,
    layers,
    4,
    64,
    use_triton=False,
    disable_exllama=disable_exllama,
    use_cuda_fp16=True,
    desc_act=False,
    trainable=trainable
)
model.tie_weights()

device_map="auto" device=None

== step3: load checkpoint and dispatch ==

if isinstance(device_map, str) and device_map not in ["auto", "balanced", "balanced_low_0", "sequential"]: raise ValueError( "If passing a string for device_map, please choose 'auto', 'balanced', 'balanced_low_0' or " "'sequential'." )

max_memory = None if isinstance(device_map, dict): max_memory = None else: if device is None and not device_map and not max_memory: device_map = "auto" if device is not None: device = torch.device(device) if not max_memory and not device_map: device_map = {"": device.index if device.type == "cuda" else device.type} if not isinstance(device_map, dict) and device_map != "sequential": max_memory = accelerate.utils.get_balanced_memory( model=model, max_memory=max_memory, no_split_module_classes=['LlamaDecoderLayer'], low_zero=(device_map == "balanced_low_0") ) if not isinstance(device_map, dict): device_map = accelerate.infer_auto_device_map( model, max_memory=max_memory, no_split_module_classes=['LlamaDecoderLayer'] )

accelerate.utils.modeling.load_checkpoint_in_model( model, checkpoint=model_save_name, device_map=device_map, offload_state_dict=True, offload_buffers=True ) model = simple_dispatch_model(model, device_map)

== step4: set seqlen ==

model_config = model.config.to_dict() seq_len_keys = ["max_position_embeddings", "seq_length", "n_positions"] if any([k in model_config for k in seq_len_keys]): for key in seq_len_keys: if key in model_config: model.seqlen = model_config[key] break else: model.seqlen = 4096

Any post-initialization that require device information, for example buffers initialization on device.

model = autogptq_post_init(model, use_act_order=False)

model.eval() 从layers = find_layers(model)开始,后面这些做训练的时候要不要带上,另外这样改,lora训练会不会和普通模型一样