🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Hi, authors. Could you provide the dataset names which are used to train the model of `Otter-MPT7B Image`? All mimic-it dataset (including video-text data) or just LA/SD dataset (only image-text data)?
Originally posted by @Aopolin-Lv in https://github.com/Luodian/Otter/issues/186#issuecomment-1628312278