Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.52k stars 241 forks source link

[dataset] How to prepare OtterHD datasets? #296

Closed yxchng closed 7 months ago

Luodian commented 8 months ago

It's described here

https://github.com/Luodian/Otter/blob/main/docs/mimicit_format.md

yxchng commented 8 months ago

I don't quite understand. I don't see any mention of datasets like M3IT there. Isn't OtterHD using more data than mimicit?

peiliu0408 commented 8 months ago

mark

Luodian commented 8 months ago

I don't quite understand. I don't see any mention of datasets like M3IT there. Isn't OtterHD using more data than mimicit?

you can check the markdown file which describes the format, basically you can easily convert any dataset into this format. That's what we do on our cluster, we have 40+ converted datasets like M3IT, SVIT, PoliteFlamingo...

I could gradually updated them to the folder.

I've already updated some files here.