Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.56k stars 242 forks source link

[dataset] Related instruction IDs for LA In-context are incorrect #149

Closed yukw777 closed 1 year ago

yukw777 commented 1 year ago

I was looking around the annotations for LA in-context, I noticed that the instructions specified as related instructions do not exist. Dense Caption doesn't seem to have this problem.

In [2]: import json

In [3]: with open('/path/to/LA_instructions.json') as f:
   ...:     annotations = json.load(f)
   ...: 

In [4]: list(annotations['data'].keys())[:10]
Out[4]: 
['LACONV_00_INS_000000033471_2',
 'LACONV_00_INS_000000052846_4',
 'LACONV_00_INS_000000334872_3',
 'LACONV_00_INS_000000319154_4',
 'LACONV_00_INS_000000398214_4',
 'LACONV_00_INS_000000520873_4',
 'LACONV_00_INS_000000575173_3',
 'LACONV_00_INS_000000087286_3',
 'LACONV_00_INS_000000032286_4',
 'LACONV_00_INS_000000175217_4']

In [5]: annotations['data']['LACONV_00_INS_000000033471_2']
Out[5]: 
{'instruction': 'Is the bus driving down the street or pulled off to the side?',
 'answer': 'The bus is driving down the street, which is crowded with people and other vehicles.',
 'image_ids': ['LA_00_IMG_000000033471'],
 'rel_ins_ids': ['LACONV_00_INS_000000033471_0',
  'LACONV_00_INS_000000033471_1']}

In [6]: annotations['data']['LACONV_00_INS_000000033471_0']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[6], line 1
----> 1 annotations['data']['LACONV_00_INS_000000033471_0']

KeyError: 'LACONV_00_INS_000000033471_0'

In [7]: annotations['data']['LACONV_00_INS_000000033471_1']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[7], line 1
----> 1 annotations['data']['LACONV_00_INS_000000033471_1']

KeyError: 'LACONV_00_INS_000000033471_1'

Maybe I've misunderstood what related instructions are? Either way, please let me know!

Luodian commented 1 year ago

Thanks! We are working on final check on our dataset, and will get back to you soon.

ZhangYuanhan-AI commented 1 year ago

Thank you for your interest in Otter!

We acknowledge the issue with the wrong "instruction" file upload and apologize for this. And, we've checked and ensured that the "instruction" file stored in the local is correct.

The correct LLaVA "instruction" file will be uploaded this week. Stay tuned!

Luodian commented 1 year ago

Here's our updated instructions at OneDrive.

Inside the folder named LA, you will find multiple instruction files with the suffix _instructions.json. These files contain the instructions we have provided. Additionally, there are other folders and files already available, except for the TVC folder that we need to do a final check. We will soon update our readme to announce the release of instructions and provide guidance on how to convert-it from public datasets to our required image file format, such as LA.json.

Please note that the annotations I previously provided may be incorrect because I intended to merge them into a single file. However, our training process did not actually combine the LA_CONV, LA_DD, LACR_I2I, and LACR_T2T annotations together.

Our training pipeline loads them with comma separator, like the following, note that --images_path should also be repeated 4 times to align with --mimicit_path and --train_config_path.

export PYTHONPATH=.

accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \
pipeline/train/instruction_following.py \
--pretrained_model_name_or_path=/home/luodian/azure_storagev2/otter/checkpoints/otter9B_DC_fullset_no_desc_june12 \
--dataset_resampled \
--mimicit_path="/path/to/LACR_I2I_instructions.json,/path/to/LACR_T2T_instructions.json,/path/to/LACONV_instructions.json,/path/to/LADD_instructions.json" \
--images_path="/path/to/LA.json,/path/to/LA.json,/path/to/LA.json,/path/to/LA.json" \
--train_config_path="/path/to/LACR_I2I_train.json,/path/to/LACR_T2T_train.json,/path/to/LACONV_train.json,/path/to/LADD_train.json" \
--batch_size=16 \
--num_epochs=6 \
--report_to_wandb \
--wandb_entity=ntu-slab \
--external_save_dir=./checkpoints \
--save_hf_model \
--run_name=otter9B_LA \
--wandb_project=otter9B \
--workers=8 \
--cross_attn_every_n_layers=4 \
--lr_scheduler=cosine \
--delete_previous_checkpoint \
--learning_rate=1e-5 \
--warmup_steps_ratio=0.01

The --mimicit_path loads our provided _instructions.json and --train_config_path loads _train.json.

Inside _train.json, each instruction is associated with its related instructions. We provide it for more flexibly define each instruction's related instructions. It serves for different in-context objectives.