Issue dumpinmg results for sugarcrepe benchmark

escorciav commented 3 weeks ago

There is an issue dumping results for all the tasks/subsets of sugarcrepe output json, no?

It runs over all the split but only retain results for sugar_crepe/swap_obj

$ clip_benchmark eval --model ViT-B-16 --pretrained laion400m_e32 --dataset=sugar_crepe --output=vitb16_sugarcrepe.json --dataset_root ~/datasets/coco
Models: [('ViT-B-16', 'laion400m_e32')]
Datasets: ['sugar_crepe/add_att', 'sugar_crepe/add_obj', 'sugar_crepe/replace_att', 'sugar_crepe/replace_obj', 'sugar_crepe/replace_rel', 'sugar_crepe/swap_att', 'sugar_crepe/swap_obj']
Languages: ['en']
Running 'image_caption_selection' on 'sugar_crepe/add_att' with the model 'laion400m_e32' on language 'en'
Dataset size: 692
Dataset split: test
  0%|                                                                   | 0/11 [00:00<?, ?it/s]/home/SERILOCAL/v.castillo/projects/genai-research/clip-benchmark/clip_benchmark/metrics/image_caption_selection.py:55: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.no_grad(), autocast():but the output json might not be generated as you intented.

It runs over all the split but only retain results for `sugar_crepe/swap_obj`

$ clip_benchmark ce_rel', 'sugar_crepe/swap_att', 'sugar_crepe/swap_obj'] Languages: ['en'] Running 'image_caption_selection' on 'sugar_crepe/add_att' with the model 'laion400m_e32' on language 'en' Dataset size: 692 Dataset split: test 0%| | 0/11 [00:00<?, ?it/s]/home/SERILOCAL/v.castillo/projects/genai-research/clip-benchmark/clip_benchmark/metrics/image_caption_selection.py:55: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. with torch.no_grad(), autocast(): 100%|██████████████████████████████████████████████████████████| 11/11 [00:02<00:00, 5.36it/s] Dump results to: vitb16_sugarcrepe.json Running 'image_caption_selection' on 'sugar_crepe/add_obj' with the model 'laion400m_e32' on language 'en' Dataset size: 2062 Dataset split: test 100%|██████████████████████████████████████████████████████████| 33/33 [00:04<00:00, 6.94it/s] Dump results to: vitb16_sugarcrepe.json Running 'image_caption_selection' on 'sugar_crepe/replace_att' with the model 'laion400m_e32' on language 'en' Dataset size: 788 Dataset split: test 100%|██████████████████████████████████████████████████████████| 13/13 [00:02<00:00, 6.06it/s] Dump results to: vitb16_sugarcrepe.json Running 'image_caption_selection' on 'sugar_crepe/replace_obj' with the model 'laion400m_e32' on language 'en' Dataset size: 1652 Dataset split: test 100%|██████████████████████████████████████████████████████████| 26/26 [00:04<00:00, 6.15it/s] Dump results to: vitb16_sugarcrepe.json Running 'image_caption_selection' on 'sugar_crepe/replace_rel' with the model 'laion400m_e32' on language 'en' Dataset size: 1406 Dataset split: test 100%|██████████████████████████████████████████████████████████| 22/22 [00:03<00:00, 6.07it/s] Dump results to: vitb16_sugarcrepe.json Running 'image_caption_selection' on 'sugar_crepe/swap_att' with the model 'laion400m_e32' on language 'en' Dataset size: 666 Dataset split: test 100%|██████████████████████████████████████████████████████████| 11/11 [00:02<00:00, 5.28it/s] Dump results to: vitb16_sugarcrepe.json Running 'image_caption_selection' on 'sugar_crepe/swap_obj' with the model 'laion400m_e32' on language 'en' Dataset size: 245 Dataset split: test 100%|████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.82it/s] Dump results to: vitb16_sugarcrepe.json

Models: [('ViT-B-16', 'laion400m_e32')]
Datasets: ['sugar_crepe/add_att', 'sugar_crepe/add_obj', 'sugar_crepe/replace_att', 'sugar_crepe/replace_obj', 'sugar_crepe/replace_rel', 'sugar_crepe/swap_att', 'sugar_crepe/swap_obj']
Languages: ['en']
Running 'image_caption_selection' on 'sugar_crepe/add_att' with the model 'laion400m_e32' on language 'en'
Dataset size: 692
Dataset split: test
  0%|                                                                   | 0/11 [00:00<?, ?it/s]/home/SERILOCAL/v.castillo/projects/genai-research/clip-benchmark/clip_benchmark/metrics/image_caption_selection.py:55: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.no_grad(), autocast():
100%|██████████████████████████████████████████████████████████| 11/11 [00:02<00:00,  5.36it/s]
Dump results to: vitb16_sugarcrepe.json
Running 'image_caption_selection' on 'sugar_crepe/add_obj' with the model 'laion400m_e32' on language 'en'
Dataset size: 2062
Dataset split: test
100%|██████████████████████████████████████████████████████████| 33/33 [00:04<00:00,  6.94it/s]
Dump results to: vitb16_sugarcrepe.json
Running 'image_caption_selection' on 'sugar_crepe/replace_att' with the model 'laion400m_e32' on language 'en'
Dataset size: 788
Dataset split: test
100%|██████████████████████████████████████████████████████████| 13/13 [00:02<00:00,  6.06it/s]
Dump results to: vitb16_sugarcrepe.json
Running 'image_caption_selection' on 'sugar_crepe/replace_obj' with the model 'laion400m_e32' on language 'en'
Dataset size: 1652
Dataset split: test
100%|██████████████████████████████████████████████████████████| 26/26 [00:04<00:00,  6.15it/s]
Dump results to: vitb16_sugarcrepe.json
Running 'image_caption_selection' on 'sugar_crepe/replace_rel' with the model 'laion400m_e32' on language 'en'
Dataset size: 1406
Dataset split: test
100%|██████████████████████████████████████████████████████████| 22/22 [00:03<00:00,  6.07it/s]
Dump results to: vitb16_sugarcrepe.json
Running 'image_caption_selection' on 'sugar_crepe/swap_att' with the model 'laion400m_e32' on language 'en'
Dataset size: 666
Dataset split: test
100%|██████████████████████████████████████████████████████████| 11/11 [00:02<00:00,  5.28it/s]
Dump results to: vitb16_sugarcrepe.json
Running 'image_caption_selection' on 'sugar_crepe/swap_obj' with the model 'laion400m_e32' on language 'en'
Dataset size: 245
Dataset split: test
100%|████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  3.82it/s]
Dump results to: vitb16_sugarcrepe.json

escorciav commented 3 weeks ago

not big deal. One can dispatch a bash/whatever script along these lines:

#!/bin/bash

# Define the model and pretrained settings
model="ViT-B-16"
model_name=vitb16
pretrained="laion400m_e32"
dataset_root="/home/SERILOCAL/v.castillo/datasets/coco"
# SugarCrepe the tasks
tasks=("add_att" "add_obj" "replace_att" "replace_obj" "replace_rel" "swap_att" "swap_obj")

for task in "${tasks[@]}"
do
    # Construct the dataset and output paths
    dataset="sugar_crepe/$task"
    output="${model_name}_sugarcrepe-$task.json"

    # Run the command
    clip_benchmark eval --model $model --pretrained $pretrained --dataset=$dataset --output=$output --dataset_root $dataset_root
done

then ask a llm to merge them :laughing: . Perhaps the clip even merge json. Getting familiar with it atm :blush: Thanks for putting this together :love_you_gesture:

escorciav commented 3 weeks ago

my bad it's related to the output using template along these lines should fix it --output='out_{dataset}.json'

cc @mehdidc

LAION-AI / CLIP_benchmark

Issue dumpinmg results for sugarcrepe benchmark #128