Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.54k stars 242 forks source link

Coco Captioning Performance #290

Open essamsleiman opened 9 months ago

essamsleiman commented 9 months ago

Hi, I'm evaluating Otter on Coco Image captioning and am getting relatively poor performance for 0-shot ICL in comparison to 2-shot and 4-shot.

python -m torch.distributed.launch --nproc_per_node=4 evaluate.py \ --coco_val_image_dir_path=path \ --coco_annotations_json_path=path \ --coco_train_image_dir_path=.path \ --coco_karpathy_json_path=path \ --model=otter \ --model_path=luodian/OTTER-Image-MPT7B \ --device=cuda:0 \ --precision=fp32 \ --batch_size=8 \ --eval_coco \ --shots=$SHOTS_VALUE \

This is how my run_eval_coco.sh file looks like, nothing else has been modified. Please let me know if it looks correct.

This is the CIDEr performance I'm getting:

0-shot: 0.308 2-shot: 0.942 4-shot: 0.996

Is that low of 0-shot performance expected?

Thanks!