Closed TonyXuQAQ closed 1 year ago
The model of 7b can be trained using deepspeed's zero3. 13b needs at least 16 V100s, but the speed is very slow.
Thanks for the information. It seems that Valley does not support inference with Lora, right?
Yes, the code is support inference with lora. You need to merge the lora weight using the code as follows.
from peft import PeftModel, PeftConfig
config = PeftConfig.from_pretrained(model_name)
if 'config.json' in os.listdir(model_name):
model_old = ValleyLlamaForCausalLM.from_pretrained(model_name)
else:
model_old = ValleyLlamaForCausalLM.from_pretrained(config.base_model_name_or_path)
print('load lora model')
model = PeftModel.from_pretrained(model_old, model_name)
model = model.merge_and_unload().half()
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
Thanks for the prompt response. Does Valley tune the multimodal projection layer (MPL) with LoRA? I guess the answer is Yes. But in your LoRA training code, it seems that you do not save the weights of MPL.
No, only the LLM do LoRA tuning. Because I think the projection layer do not need to do Lora, Since it is small. You can customize the Lora tuning parameters by changing the line 138 in file "valley/train/train.py"
target_modules=['model.layers.'+str(i)+'.'+ k for i in range(40) for k in ["self_attn.q_proj", "self_attn.k_proj", "self_attn.v_proj", "self_attn.o_proj", "mlp.gate_proj","mlp.down_proj","mlp.up_proj"]]
You can add me wechat by search RupertLuo, If you have other questions later.
Thanks for your detailed response. I will have a try.
Sorry to interrupt again. I think github would be a better place for discussion. I finetuned Valley with deepspeed_zero3 (no LoRA), and inference with the finetuned model. The steps are:
Since I can run LLaVA on V100s without errors, if possible, could you please provide runnable scripts or more information for finetune training and inference with deepspeed_zero3?
Thanks in advance!
Are you using the 7b model or the 13b model?
7b model
I will test it as soon as possible and get back to you
Thanks for your prompt response and look forward to hearing from you.
Thank you for your feedback. I ran the training script on 8 v100 machines and fixed some bugs (but the size mismatch bug you reported in the screenshots did not appear).
Directly run the following command to perform zero3 parallel full parameter fine-tuning on the 7b model with batchsize=1.
bash valley/train/train.sh valley/configs/experiments/valley_stage2_zero3.yaml
But I don't recommend doing full parameter training on v100 machines. Due to some deepspeed bugs, zero3 parallelism can only support batchsize=1 operation. If batchsize>1, it will get stuck in backpropagation. This is also some problems I encountered in the early stage. If you know how to solve this problem, please let me know. My training parameters and graphics card information during training are shown in the figure below.
An alternative method is to use lora training for LLM and train full parameters for the projection layer. Run the command below to do this way. In this way, you can adjust batchsize>1.
bash valley/train/train.sh valley/configs/experiments/valley_stage2_lora.yaml
My training parameters and graphics card information during training are shown in the figure below.
Thanks for the detailed feedback. The reported error occurs during inference by using trained finetuned weighted. Can you conduct inference by bash (i.e.,
python3 inference/run_valley.py --model-name [PATH TO VALLEY WEIGHT] --video_file [PATH TO VIDEO] --quary [YOUR QUERY ON THE VIDEO]
)?
Are you using the weights I uploaded on huggingface? At present, I run the following code without any problems. You may need to update your code.
python3 valley/inference/run_valley.py --model-name ../../hf_hub/Valley2-7b
Running the provided Valley2-7b has no problems. I mean use the finetuned checkpoints for inference (i.e., using the weights you have just trained by finetuning with deep speed zero3).
Can I take a look at the file structure of your checkpoint? The weight of zero3 needs to run a script zero_to_fp32.py.
The finetuning checkpoints by deepspeed zero3 looks like this, and its size is only 700K. Can you try it? How to inference with such a finetuned checkpoint?
It doesn't look right, can I have a look at your training script command and config file.
I know what is wrong, I just used normal save at the end of the training script. During the training process, the checkpoint saved according to the step will not have this problem, because the deepspeed saving method is called in the trainer, and it needs to aggregate the parameters of each part first. If it is zero3, you have to use the deepspeed storage method. I will fix this bug as soon as possible.
Thanks for the information, could you please provide the API for doing this, can I use code in LLaVA for this function like this:
I have fixed this bug already. You need to update the code from this repository. I modified the save function as follows:
def safe_save_model_for_hf_trainer(trainer: transformers.Trainer,
output_dir: str):
"""Collects the state dict and dump to disk."""
if trainer.args.lora:
if trainer.args.should_save:
trainer.model.save_pretrained(output_dir)
else:
if trainer.deepspeed:
print('saving deepspeed model...')
torch.cuda.synchronize()
trainer.save_model(output_dir)
return
state_dict = trainer.model.state_dict()
if trainer.args.should_save:
cpu_state_dict = {
key: value.cpu()
for key, value in state_dict.items()
}
del state_dict
trainer._save(output_dir, state_dict=cpu_state_dict) # noqa
The model will be saved when using zero3, The saved file structure is as follows.
Thanks for the prompt response! I will have a try. By the way, during inference, can I directly load the finetuned model by setting model-name
to the finetuned checkpoint? Or do I need to load LLaMA2 as the base and then load the finetuned projector?
Yes, directly use the checkpoint path is ok, because all parameter is already saved in the checkpoints.
May I know whether this repo supports V100 for training at this stage?