Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.54k stars 242 forks source link

How to train the whole MIMIC-IT dataset? #284

Closed ElegantLin closed 10 months ago

ElegantLin commented 10 months ago

HI authors,

Thanks for your great work. It seems that MIMIC-IT is a dataset consisting of several sub-datasets. Otter benefits from all these datasets and can preform well in several down-stream VL tasks. However, in the example, mimicit_path, images_path, train_config_path can only accept one json file, respectively. I wonder whether it is possible to set several json files for these arguments. For example, can I set the following arguments?

--mimicit_path="path/to/DC_instruction.json;path/to/SD_instruction"
--image_path="path/to/DC.json;path/to/SD.json"
--train_config_path="path/to/DC_train.json;/path/to/SD_train.json"

The tag of this issue should be train.

Thanks!

Before you open an issue, please check if a similar issue already exists or has been closed before.

When you open an issue, please be sure to include the following

Thank you for your contributions!

Luodian commented 10 months ago

hi our current implementation use different args to load different groups of datasets. For video datasets, you should load it with mimicit_vt_path (others also follow the vt tag). For general image-text, you should load it with mimicit_path (no specific tag).

We will have a next release maybe late this month or early next month. There will be a big update in training pipeline, and it would better support mixture training. If you do have interests using our codebase, you could send me an email and we could discuss more on that.

ElegantLin commented 10 months ago

Thanks for your quick reply. My task is only about image-text at this time so I should only use mimicit_path. But I understand the current version cannot support mixture training and I can only pass one json file here. Is that correct?

Thanks!