Open ChenFicha opened 3 weeks ago
@Isaachhh Yes, I did follow the instruction. And here I am testing continuously fine-tuning. I am testing the visual-instruction-tuning. I used the "Recipe-2" in Bunny-v1.1-4B.md.
@zycoldness
You're exactly right. The problem was resolved after I commented out all the @torch.no_grad()
lines. Thank you very much!
@Isaachhh
The problem has been solved, thanks to @zycoldness' advice. I had overlooked the @torch.no_grad()
in siglip_encoder.py
. Thank you for your help as well!
BTW, here is what I got, both the vision_tower
and mm_projector
are in the param list.
['base_model.model.model.vision_tower.vision_tower.vision_model.embeddings.patch_embedding.weight', 'base_model.model.model.vision_tower.vision_tower.vision_model.embeddings.patch_embedding.bias',
...
'base_model.model.model.vision_tower.vision_tower.vision_model.head.mlp.fc2.weight', 'base_model.model.model.vision_tower.vision_tower.vision_model.head.mlp.fc2.bias', 'base_model.model.model.mm_projector.0.weight', 'base_model.model.model.mm_projector.0.bias', 'base_model.model.model.mm_projector.2.weight', 'base_model.model.model.mm_projector.2.bias']
That's pretty weird. As shown by you, the weights of the vision encoder before training and bunny_phi3 are different, which means the vision encoder was tuned during visual instruction tuning.
So, the current code works when I trained Bunny? It may be related to the version of the packages.
It might be, here is my package setting that you may refer to:
Docker:
nvcr.io/nvidia/pytorch:23.12-py3
Python:
Python-3.9.17
pip:
torch==2.3.1
torchvision==0.18.1
torchaudio==2.3.1
deepspeed==0.14.4
transformers==4.42.3
notebook==7.2.1
einops==0.8.0
accelerate==0.31.0
sentencepiece==0.2.0
timm==1.0.7
peft==0.11.1
datasets==2.20.0
evaluate==0.4.2
openpyxl==3.1.5
prettytable==3.10.0
openai==1.35.13
protobuf==5.27.2
gdown==5.2.0
spacy==3.7.5
nltk==3.8.1
bitsandbytes==0.43.3
ds_report:
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.9/site-packages/torch']
torch version .................... 2.3.1+cu121
deepspeed install path ........... ['/usr/local/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.14.4, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.3
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
shared memory (/dev/shm) size .... 31.30 GB
i am trying to continues fine-tune the model. But I found that the vision_tower is not updated. So I try to use the "Recipe-2" in Bunny-v1.1-4B.md to fine-tune Bunny with your pretrained mm_projector. I use a large lr and 10 images from "bunny_695k.json":
I added some codes in "train.py" to make save the parameters before and after train:
I also extract the Bunny_v1.1_4B parameters from your weights:
Then, I used the following codes to compare the parameters:
The results shows that seems the vision_tower is not updated even the param.require_grad = True:
I am confused the vision_tower is not updated even I set
--unfreeze_vision_tower True
. Is there anything I missed?