GaiZhenbiao / Phi3V-Finetuning

Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
Apache License 2.0
43 stars 11 forks source link
English | 简体中文

Training Phi3-V with PEFT

This repository contains a script for training the Phi3-V model with Parameter-Efficient Fine-Tuning (PEFT) techniques using various configurations and options.

Table of Contents

Supported Features

Installation

Install the required packages using either requirements.txt or environment.yml.

Using requirements.txt

pip install -r requirements.txt

Using environment.yml

conda env create -f environment.yml
conda activate phi3v

Model Download

Before training, download the Phi3-V model from HuggingFace. It is recommended to use the huggingface-cli to do this.

  1. Install the HuggingFace CLI:
pip install -U "huggingface_hub[cli]"
  1. Download the model:
huggingface-cli download microsoft/Phi-3-vision-128k-instruct --local-dir Phi-3-vision-128k-instruct --resume-download

Usage

To run the training script, use the following command:

bash scripts/train.sh

Note: Remember to replace the paths in train.sh with your specific paths.

Arguments

Dataset Preparation

The script requires a dataset formatted according to the LLaVA specification. The dataset should be a JSON file where each entry contains information about conversations and images. Ensure that the image paths in the dataset match the provided --image_folder.

Example Dataset ```json [ { "id": "000000033471", "image": "000000033471.jpg", "conversations": [ { "from": "human", "value": "\nWhat are the colors of the bus in the image?" }, { "from": "gpt", "value": "The bus in the image is white and red." }, { "from": "human", "value": "What feature can be seen on the back of the bus?" }, { "from": "gpt", "value": "The back of the bus features an advertisement." }, { "from": "human", "value": "Is the bus driving down the street or pulled off to the side?" }, { "from": "gpt", "value": "The bus is driving down the street, which is crowded with people and other vehicles." } ] } ... ] ```

TODO

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.

This project borrowed code from LLaVA and Microsoft Phi-3-vision-128k-instruct. Thanks to both projects for their contributions.

Citation

If you use this codebase in your work, please cite this project:

@misc{phi3vfinetuning2023,
  author = {Gai Zhenbiao & Shao Zhenwei},
  title = {Phi3V-Finetuning},
  year = {2023},
  publisher = {GitHub},
  url = {https://github.com/GaiZhenbiao/Phi3V-Finetuning},
  note = {GitHub repository},
}