Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.56k stars 241 forks source link

[dataset] Some question about instruction data generation (Syphus) #204

Closed xjtupanda closed 1 year ago

xjtupanda commented 1 year ago

I'm trying to follow your great work and now trying to develope my own dataset. #202 has solved much of my questions, but I still have some confusion about implementation details, hope you could help me.

  1. Did you use any vision model to transcribe images into text descriptions (e.g., bounding boxes, captions) like LLaVA do? Or just reformat the annotations of the official datasets?
  2. How to format query_input? As I notice the annotation files are not released but they are referenced in the "query_inputs_path" variable of dataset implementation, e.g.,https://github.com/Luodian/Otter/blob/e7489a02d79e39e3e08fd983c72f2d7e6a30d622/mimic-it/syphus/datasets/change.py#L15 So I guess the returned id of _load_query_inputs method should align with image ID and the sentences should be context information similar to the "in-context examples" ?https://github.com/Luodian/Otter/blob/e7489a02d79e39e3e08fd983c72f2d7e6a30d622/mimic-it/syphus/prompts/spot_the_difference.json#L6 Could you share some annotation files so we can have a more straightforward understanding?
  3. If my guesses above are correct, the questions (instructions) and the answers are both extracted by GPT automatically?

Thanks in advance!

xjtupanda commented 1 year ago

I've figured it out myself. For those who might be interested:

  1. The annotation files are from the public datasets without introducing external modules, but integrating vision models might be benificial.
  2. The sentences should be context information similar to the "in-context examples".
  3. The questions/instructions and answers are both extracted by GPT automatically.
zuwenqiang commented 1 year ago

Hi, I encountered the same issue as you did. Could you please share an example of an annotation file?

xjtupanda commented 1 year ago

Hi, I encountered the same issue as you did. Could you please share an example of an annotation file?

@zuwenqiang Take 'Spot the Difference' dataset as an example. You should:

  1. download the corresponding official annotation files Link.
  2. modify the path of the corresponding annotation file in https://github.com/Luodian/Otter/blob/5e949c63ec38773fe639131bfcc800409172c495/mimic-it/syphus/datasets/change.py#L15
  3. follow the steps and run the script as in Link.
Luodian commented 1 year ago

@pufanyi

zuwenqiang commented 1 year ago

Hi, I encountered the same issue as you did. Could you please share an example of an annotation file?

@zuwenqiang Take 'Spot the Difference' dataset as an example. You should:

  1. download the corresponding official annotation files Link.
  2. modify the path of the corresponding annotation file in https://github.com/Luodian/Otter/blob/5e949c63ec38773fe639131bfcc800409172c495/mimic-it/syphus/datasets/change.py#L15
  3. follow the steps and run the script as in Link.

Thank you for your helpful response, the issue has been resolved now.