GaiZhenbiao / Phi3V-Finetuning

Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
Apache License 2.0
43 stars 11 forks source link

Adding fine-tuning other modules #9

Closed 2U1 closed 2 months ago

2U1 commented 2 months ago

I've added fine-tuning the img_projector and vision_model both.

The code base was from https://github.com/zhuyiche/llava-phi/tree/main

2U1 commented 2 months ago

Also I added eos token to the input prompt. Based on phi3 cookbook finetuning code. When fine-tuned, it keeps generating until the max_new_token because it dosen't generate the eos token.

The separator could be used for the stopping_criteria when generating, but sometimes overall summary comes after the separator.

ParadoxZW commented 2 months ago

Hi, thanks for your PR.

eos_token is truly a big issue of phi3-mini series model, but I think I already solved this problem in this repo.

First, the processor.tokenizer.apply_chat_template constructs prompts/input text like following

<|user|>
<|image_1|>
<|image_2|>
What is the difference between these two images?<|end|>
<|assistant|>
There is no difference.<|end|>
<|user|>
check again<|end|>
<|assistant|>

We can see that you could use <|end|> as the eos token. (When performing model inference after the model is finetuned by this repo's code, you should config the generation_config with <|end|> as eos. This should lead to correct behavior.)

Second, when constructing labels for SFT, I make the loss cover the eos token (i.e., <|end|>) of assistant responses. This should make the model learn to generate <|end|> properly (unless I made some mistakes for coding this logic).

P.S. I do observed that phi-3-mini-4k (the language model) have not learned to predict <|end|> when it should do. I guess it is a small mistake coming from 'microsoft/phi-3' team. They may forget to make loss cover <|end|> when finetuned the model.

2U1 commented 2 months ago

Thanks for your comment. I'm using the <|end|> as the eos token too and saw the codes you've made the loss for <|end|> propagated properly. However I'm still bit confused about the eos_token becuase there is a token names eos that is <|endoftext|> in the tokenzier. Also in the discussion https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/discussions/26 , the phi3 team has commented <|endoftext|> can be used as eos_token during fine-tuning.

I'll go on some experiments with this and some other settings too.

Once again, thanks for your comment and the fine-tuning codes.

ParadoxZW commented 2 months ago

Hi @2U1 , from my point of view, <|end|> is enough as an eos token for (continue-) finetuning this model to learn a valid behavior of how to end responses. For generation, just replace <|end|> with <|endoftext|> once the finetuning is finished (also generation_config.json allow us to set a list of possible eos).

As soon as we finish the discussions about details of this PR and remove unnecessary changes, I would merge it.

2U1 commented 2 months ago

@ParadoxZW Thanks for your explanation. By the way I haven't commited the merging the lora weights. I'll upload it asap after testing.

ParadoxZW commented 2 months ago

@2U1 I've made some comments in the Files changed page about specific changed codes. May you reply them when you get time?

2U1 commented 2 months ago

@ParadoxZW Yes, sure also I found that I missed some optimizer code and some other seetings for merging lora and I'm working on it. BTW I can't see any comments there. I think you haven't submitted the reivews yet.

2U1 commented 2 months ago

@ParadoxZW I've updated the create_optimizer function to give differenct lr to the lora module and non_lora module. Also added to save the processor config, not the tokenizer config.

In the merging lora weights, I don't know exactly why I get the shared memory error, so I had to just save the model with safe_serialization=False . If you have any idea to fix it let me know.

ParadoxZW commented 1 month ago

Sorry, I am too busy and overwhelmed with end-of-semester stuffs in this week.

First, do you mind add your code to a develop branch of this repo, so I can try to debug it and more easily cooperate with you?

Second, by comments in Files changed page, I mean

image image

I don't know you cannot see them.

Last, thanks for your efforts and contributions!

2U1 commented 1 month ago

Sorry, I'd see this late. Of courese you could. I updated some other things and, still updating other features in the forked repo.

Also, I've tested some testing, and fine-tuning the full module including the vision_model showed the best performance for me.