Coco Captioning Performance

Hi, I'm evaluating Otter on Coco Image captioning and am getting relatively poor performance for 0-shot ICL in comparison to 2-shot and 4-shot.

python -m torch.distributed.launch --nproc_per_node=4 evaluate.py \ --coco_val_image_dir_path=path \ --coco_annotations_json_path=path \ --coco_train_image_dir_path=.path \ --coco_karpathy_json_path=path \ --model=otter \ --model_path=luodian/OTTER-Image-MPT7B \ --device=cuda:0 \ --precision=fp32 \ --batch_size=8 \ --eval_coco \ --shots=$SHOTS_VALUE \

This is how my run_eval_coco.sh file looks like, nothing else has been modified. Please let me know if it looks correct.

This is the CIDEr performance I'm getting:

0-shot: 0.308 2-shot: 0.942 4-shot: 0.996

Is that low of 0-shot performance expected?

Thanks!

Luodian / Otter

Coco Captioning Performance #290