Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.56k stars 242 forks source link

questions about different instruction files in LA #177

Closed Maxlinn closed 1 year ago

Maxlinn commented 1 year ago

hi to the team, otter seems really interesting!

when i'm investigating the released dataset, i found that LA (in my opinion it refers to LLaVA) folder contains more than one instruction files(and their corresponding _train.json), they are

LACONV_instructions.json
LACR_I2I_instructions.json
LACR_T2T_instructions.json
LADD_instructions.json

what are these instruction prefixes acutally mean?

I carefully looked into the MIMIC-IT paper, it mentioned LA-T2T task but did not revealing what it is. From this issue #149 i know they are jointly used to train OTTER-9B-LA-InContext, but still at sea.

i would extend sincere apology if i missed something, thanks for your patience!

ZhangYuanhan-AI commented 1 year ago

The LACR_T2T and LACR_I2I, are in-context learning variants of Llava's complex reasoning task.

In the LACR_X2X_instructions set, an instruction A is associated with a group of in-context examples, denoted as rel_ins_ids = [B,C,D].

For LACR_T2T, instructions B,C, and D possess semantic relation with instruction A. This relation is determined by leveraging the sentence-transformers/all-MiniLM-L6-v2 model.

For LACR_I2I, instructions B,C, and D are associated with images that exhibit semantic relevance to the image paired with instruction A. The semantic relatedness in this case is quantified by the CLIP-B/16 model.

Maxlinn commented 1 year ago

much thanks for the immediate help!