Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.56k stars 242 forks source link

Question about best prompt style for classification #224

Open vishaal27 opened 1 year ago

vishaal27 commented 1 year ago

Hey, I have a question around the best prompt format for evaluating Otter models (both the MPT7B and LLaMA7B variants).

Currently, I am evaluating Otter on image classification using the following prompt style:

PROMPT = '<image> Q: Describe the image. A: This is an image of a {}.'

Is it better to switch to the below style of prompting for this sort of a task in your experience?

PROMPT = '<image> User: Describe the image. GPT: This is an image of a {}.'
vishaal27 commented 1 year ago

In my initial experiments, it seems that the format

PROMPT = '<image> Q: Describe the image. A: This is an image of a {}.'

is slightly better for both LLaMA and MPT models. This is a bit surprising since this deviates from the instruction template that was used during instruction tuning, any thoughts on this @Luodian?

ZhangYuanhan-AI commented 1 year ago

In your case:

cur_instruction = f"Describe the image. This is an image of a"

For MPT: PROMPT = "\User: {cur_instruction} GPT:"

For LLama2: wrap_sys = f"<>\nYou are a helpful vision language assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.\n<>\n\n" PROMPT = "[INST]{wrap_sys}\{cur_instruction}[/INST]"