LLaVA-VL / LLaVA-NeXT

Apache License 2.0
2.44k stars 174 forks source link

checkpoint for finetuning / datafile for finetuning #182

Closed YerongLi closed 2 weeks ago

YerongLi commented 3 weeks ago

https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/finetune_onevision.sh

Is this the script for SFT? where can we find the folloing checkpoint for finetuning?

"/checkpoints/projectors/${BASE_RUN_NAME}/mm_projector.bin" \

Also, could anyone post a data preparation script for fine tuning?

Luodian commented 3 weeks ago

projector: https://huggingface.co/lmms-lab/llava-onevision-projectors

data preparation: We provide the data preparation script at our data repo (the hf one).

https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data

we follow original llava data format, if you are not familiar with llava data format, recommend to check the repo: https://github.com/haotian-liu/LLaVA

basically, it's an image folder, and a json (or jsonl) file.

each item format is like

{
  "id": "000000051920",
  "image": "azure_dc_zh_92K/000000051920.jpg",
  "conversations": [
    {
      "from": "human",
      "value": "<image>\n这幅画中的关键信息有哪些?"
    },
    {
      "from": "gpt",
      "value": "这张图片展示了一位穿着深色(可能是黑色或深蓝色)毛衣的人坐在一张桌子前,正在用右手拿着一片披萨。披萨看起来有多种配料,包括绿色的菠菜和橙红色的番茄,放在一个银色的圆形披萨盘上。披萨盘上还剩下大约六片披萨。在背景中,可以看到一个穿着浅色衣服的人站在柜台后面,柜台上方有一个黑板菜单,上面用白色粉笔写着菜单项,但是字迹不清晰,无法阅读。整个场景看起来像是在一个披萨店内部,环境显得比较休闲。"
    }
  ],
  "data_source": "azure_dc_zh_92K"
}
sc268 commented 3 weeks ago

Thanks @Luodian! Could you also share a sample of theonevision_data.yaml mentioned in the finetuning script? Thanks!

YerongLi commented 2 weeks ago

where can we find the yaml file for finetune_onevision.sh? @Luodian

Thanks @Luodian! Could you also share a sample of theonevision_data.yaml mentioned in the finetuning script? Thanks!

 File "/home/yerong2/LLaVA-NeXT/llava/train/train.py", line 970, in __init__
    with open(data_path, "r") as file:
FileNotFoundError: [Errno 2] No such file or directory: './onevision_data.yaml
YerongLi commented 2 weeks ago

@sc268

Thanks @Luodian! Could you also share a sample of theonevision_data.yaml mentioned in the finetuning script? Thanks!

They provide the yaml file here.