Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.54k stars 242 forks source link

[training] Change current data related args to a yaml file #264

Closed pufanyi closed 11 months ago

pufanyi commented 11 months ago

Description

To simplify the execution of the loading script, there's no requirement to input the paths for all datasets via the command line. Instead, you can facilitate the process by adding the argument --training_data_yaml=pipeline/train/config.yaml.

In the configuration YAML file, structure your entries as follows:

old_command_1:
  - path_to_dataset_1_1
  - path_to_dataset_1_2

old_command_2:
  - path_to_dataset_2_1
  - path_to_dataset_2_2

For instance:

mimicit_vt_path:
  - /data/pufanyi/training_data/SD/SD_instructions.json
  - /data/pufanyi/training_data/CGD/CGD_instructions.json

images_vt_path:
  - /data/pufanyi/training_data/SD/SD.json
  - /data/pufanyi/training_data/CGD/CGD.json

No changes are necessary for the previous script; commands such as mimicit_vt_path remain compatible.

Checklist

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

Thank you for your contributions!