Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.52k stars 241 forks source link

[demo]: Inferring Multiple choice Questions #337

Open appledora opened 4 months ago

appledora commented 4 months ago

Before you open an issue, please check if a similar issue already exists or has been closed before.

When you open an issue, please be sure to include the following

Thank you for your contributions!

Hello, I am trying to use the in-context model (luodian/OTTER-9B-LA-InContext) to generate output from a multiple-choice question. My primary instruction looks something like this:


prompt = "<image>User: Can you pick one of the following options that best describes the image? Choose ONLY from the given two options. <options>1: cat 2: dog GPT:<answer> 1: cat<|endofchunk|><image>User: Can you pick one of the following options that best describes the image? Choose ONLY from the given two options.  <options>1: kitchen table 2: bathroom sink GPT:<answer> 2: bathroom sink<|endofchunk|><image>User: Can you pick one of the following options that best describes the image? Choose ONLY from the given two options. <options>1: chicken_wings 2: salad GPT:<answer> " 

However, considering the outputs, I feel like I am not structuring this correctly or not using the correct model for this task. I am looking for suggestions to improve my instruction, and whether I should try the different weights.