2U1 / Phi3-Vision-Finetune

An open-source implementaion for fine-tuning Phi3-Vision and Phi3.5-Vision by Microsoft.
Apache License 2.0
67 stars 9 forks source link

How to make it work with higher batch size > 1 #9

Closed DavidePaglieri closed 3 months ago

DavidePaglieri commented 3 months ago

Hi, thanks for your great work!

Currently using standard code from transformers I can train Phi-3, but only with batch size of 1. Can I ask specifically what was the change needed to make it work with larger batch sizes?

2U1 commented 3 months ago

If you're code gives an error something like RuntimeError: stack expects each tensor to be equal size, but got [2612] at entry 0 and [2467] at entry 1

It's becuase of the data collator. This pads the each data to equal size, so it should be ok.

https://github.com/2U1/Phi3-Vision-ft/blob/e90cf1c4c74e895b7f80b46c8a05ae9fa27bc5d5/src/training/data.py#L105-L136

DavidePaglieri commented 3 months ago

Thanks! The other problem compared to llava implementation, is that Phi's processor doesn't work with batches, but the tokenizer alone does.

2U1 commented 3 months ago

@DavidePaglieri I've worked with the original Phi3's processor but it works with batch with the dataset and collator I've wrote in the code (except for the label part, because the phi3's original processor dosen't makes the label). I don't know exactly what code are you using, so you could just use the dataset and collator from my dataset adding making labels in the dataset.