RupertLuo / Valley

The official repository of "Video assistant towards large language model makes everything easy"
199 stars 13 forks source link

V100 support #7

Closed TonyXuQAQ closed 1 year ago

TonyXuQAQ commented 1 year ago

May I know whether this repo supports V100 for training at this stage?

RupertLuo commented 1 year ago

The model of 7b can be trained using deepspeed's zero3. 13b needs at least 16 V100s, but the speed is very slow.

TonyXuQAQ commented 1 year ago

Thanks for the information. It seems that Valley does not support inference with Lora, right?

RupertLuo commented 1 year ago

Yes, the code is support inference with lora. You need to merge the lora weight using the code as follows.


  from peft import PeftModel, PeftConfig
  config = PeftConfig.from_pretrained(model_name)
  if 'config.json' in os.listdir(model_name):
      model_old = ValleyLlamaForCausalLM.from_pretrained(model_name)
  else:
      model_old = ValleyLlamaForCausalLM.from_pretrained(config.base_model_name_or_path)
  print('load lora model')
  model = PeftModel.from_pretrained(model_old, model_name)
  model = model.merge_and_unload().half()
  tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
TonyXuQAQ commented 1 year ago

Thanks for the prompt response. Does Valley tune the multimodal projection layer (MPL) with LoRA? I guess the answer is Yes. But in your LoRA training code, it seems that you do not save the weights of MPL.

RupertLuo commented 1 year ago

No, only the LLM do LoRA tuning. Because I think the projection layer do not need to do Lora, Since it is small. You can customize the Lora tuning parameters by changing the line 138 in file "valley/train/train.py"


target_modules=['model.layers.'+str(i)+'.'+ k for i in range(40) for k in ["self_attn.q_proj", "self_attn.k_proj", "self_attn.v_proj", "self_attn.o_proj", "mlp.gate_proj","mlp.down_proj","mlp.up_proj"]]
RupertLuo commented 1 year ago

You can add me wechat by search RupertLuo, If you have other questions later.

TonyXuQAQ commented 1 year ago

Thanks for your detailed response. I will have a try.

TonyXuQAQ commented 1 year ago

Sorry to interrupt again. I think github would be a better place for discussion. I finetuned Valley with deepspeed_zero3 (no LoRA), and inference with the finetuned model. The steps are:

  1. Load LLaMA2-pretrained weights
  2. Finetune training with deepspeed zero3, no LoRA. The trained weights are saved (quite small files).
  3. Load LLaMA2-pretrained as base, and load trained weights obtained from step 2. Error occurs Screenshot from 2023-08-15 17-56-04 It seems that mm_projector weights have zero shape. I wonder how did you save finetuned weights (with deepspeed zero3) and load them for inference. Maybe this is not a problem for A100, but I believe most people can only access V100 just like me.

Since I can run LLaVA on V100s without errors, if possible, could you please provide runnable scripts or more information for finetune training and inference with deepspeed_zero3?

Thanks in advance!

RupertLuo commented 1 year ago

Are you using the 7b model or the 13b model?

TonyXuQAQ commented 1 year ago

7b model

RupertLuo commented 1 year ago

I will test it as soon as possible and get back to you

TonyXuQAQ commented 1 year ago

Thanks for your prompt response and look forward to hearing from you.

RupertLuo commented 1 year ago

Thank you for your feedback. I ran the training script on 8 v100 machines and fixed some bugs (but the size mismatch bug you reported in the screenshots did not appear).

Directly run the following command to perform zero3 parallel full parameter fine-tuning on the 7b model with batchsize=1.

bash valley/train/train.sh valley/configs/experiments/valley_stage2_zero3.yaml

But I don't recommend doing full parameter training on v100 machines. Due to some deepspeed bugs, zero3 parallelism can only support batchsize=1 operation. If batchsize>1, it will get stuck in backpropagation. This is also some problems I encountered in the early stage. If you know how to solve this problem, please let me know. My training parameters and graphics card information during training are shown in the figure below.

image image

An alternative method is to use lora training for LLM and train full parameters for the projection layer. Run the command below to do this way. In this way, you can adjust batchsize>1.

bash valley/train/train.sh valley/configs/experiments/valley_stage2_lora.yaml

My training parameters and graphics card information during training are shown in the figure below. image image

TonyXuQAQ commented 1 year ago

Thanks for the detailed feedback. The reported error occurs during inference by using trained finetuned weighted. Can you conduct inference by bash (i.e.,

python3 inference/run_valley.py --model-name [PATH TO VALLEY WEIGHT] --video_file [PATH TO VIDEO] --quary [YOUR QUERY ON THE VIDEO]

)?

RupertLuo commented 1 year ago

Are you using the weights I uploaded on huggingface? At present, I run the following code without any problems. You may need to update your code.

python3 valley/inference/run_valley.py --model-name ../../hf_hub/Valley2-7b

image

TonyXuQAQ commented 1 year ago

Running the provided Valley2-7b has no problems. I mean use the finetuned checkpoints for inference (i.e., using the weights you have just trained by finetuning with deep speed zero3).

RupertLuo commented 1 year ago

Can I take a look at the file structure of your checkpoint? The weight of zero3 needs to run a script zero_to_fp32.py.

TonyXuQAQ commented 1 year ago

alt The finetuning checkpoints by deepspeed zero3 looks like this, and its size is only 700K. Can you try it? How to inference with such a finetuned checkpoint?

RupertLuo commented 1 year ago

It doesn't look right, can I have a look at your training script command and config file.

RupertLuo commented 1 year ago

I know what is wrong, I just used normal save at the end of the training script. During the training process, the checkpoint saved according to the step will not have this problem, because the deepspeed saving method is called in the trainer, and it needs to aggregate the parameters of each part first. If it is zero3, you have to use the deepspeed storage method. I will fix this bug as soon as possible.

TonyXuQAQ commented 1 year ago

Thanks for the information, could you please provide the API for doing this, can I use code in LLaVA for this function like this: Screenshot from 2023-08-16 13-41-55

RupertLuo commented 1 year ago

I have fixed this bug already. You need to update the code from this repository. I modified the save function as follows:

def safe_save_model_for_hf_trainer(trainer: transformers.Trainer,
                                   output_dir: str):
    """Collects the state dict and dump to disk."""

    if trainer.args.lora:
        if trainer.args.should_save: 
            trainer.model.save_pretrained(output_dir)

    else:
        if trainer.deepspeed:
            print('saving deepspeed model...')
            torch.cuda.synchronize()
            trainer.save_model(output_dir)
            return

        state_dict = trainer.model.state_dict()
        if trainer.args.should_save:
            cpu_state_dict = {
                key: value.cpu()
                for key, value in state_dict.items()
            }
            del state_dict
            trainer._save(output_dir, state_dict=cpu_state_dict)  # noqa

The model will be saved when using zero3, The saved file structure is as follows. image

TonyXuQAQ commented 1 year ago

Thanks for the prompt response! I will have a try. By the way, during inference, can I directly load the finetuned model by setting model-name to the finetuned checkpoint? Or do I need to load LLaMA2 as the base and then load the finetuned projector?

RupertLuo commented 1 year ago

Yes, directly use the checkpoint path is ok, because all parameter is already saved in the checkpoints.