X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

finetuning: No pytorch_model.bin file after running train_it.sh #220

Closed vicaranq closed 5 months ago

vicaranq commented 5 months ago

I am trying to finetune the pretrained model version with new data. It now runs successfully creating mlflow experiments and the output folder files. However, I have not found a way to use the finetuned model.

This is a similar issue to #87 and #118 but my checkpoint folder does not contain the pytorch_model.bin, so now I am having issues when trying to model.load_state_dict(prefix_state_dict). I am following the suggestions on the issues I mentioned and my code is:

lora_path = '/...../output/sft_v0.1_ft_grad_ckpt/checkpoint-500/adapter_model.bin' # no pytorch_model.bin file was found
model = MplugOwlForConditionalGeneration.from_pretrained(
      'MAGAer13/mplug-owl-llama-7b-pt',
      torch_dtype=torch.bfloat16
  )
  peft_config = LoraConfig(
      target_modules=r'.*language_model.*\.(q_proj|v_proj)', inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.05
  )
  model = get_peft_model(model, peft_config)
  model.print_trainable_parameters()
  prefix_state_dict = torch.load(lora_path, map_location='cpu')
  model.load_state_dict(prefix_state_dict)

The output folder structure is:

- adapter_config.json
- adapter_model.bin
- REDME.md
- train.log
- runs/
- checkpoint-500/
-- adapter_config.json
-- adapter_model.bin
-- optimizer.pt
-- rng_state_0.pth
...
-- rng_state_7.oth
-- scheduler.pt
-- trainer_state.json
-- training_args.bin

I get an error on the last line, model.load_state_dict(prefix_state_dict):

RuntimeError: Error(s) in loading state_dict for PeftModel:
    Missing key(s) in state_dict: "base_model.model.query_tokens", "base_model.model.vision_model.embeddings.cls_token"....

which I think confirms that I am using the wrong lora_path, but I don't see the pytorch_model.bin file. Does anyone have any insight on what may be happening or how I could load the new finetune model if I don't have the pytorch_model.bin file?

vicaranq commented 5 months ago

Solved:

from peft import PeftModel

model = MplugOwlForConditionalGeneration.from_pretrained(
    pretrained_ckpt,
    torch_dtype=torch.bfloat16
).to(device)  

m = PeftModel.from_pretrained(model, lora_adapters_path)
model = m.merge_and_unload()