tldr: Based on the observations that CLIP ResNet-50 channels are very noisy compared to typical ImageNet-trained ResNet-50, and most saliency methods obtain pretty low object localization scores with CLIP. By visualizing the top 10% most sensitive (highest-gradient) channels, our gScoreCAM obtains the state of the art weakly supervised localization results using CLIP (in both ResNet and ViT versions).
Official Implementation for the paper gScoreCAM: What is CLIP looking at? (2022) by Peijie Chen, Qi Li, Saad Biaz, Trung Bui, and Anh Nguyen. :star: Oral paper at ACCV 2022. :star:
If you use this software, please consider citing:
@inproceedings{chen2022gScoreCAM,
title={gScoreCAM: What is CLIP looking at?},
author={Peijie Chen, Qi Li, Saad Biaz, Trung Bui, and Anh Nguyen},
booktitle={Proceedings of the Asian Conference on Computer Vision (ACCV)},
year={2022}
}
:star2: Interactive Colab demo:
Install annconda following the anaconda installation documentation. Create an enviroment with all required packages with the following command :
conda env create -f gscorecam_env.yml
Other than the Colab demo above, we provide a interative command line tool for testing different visualization methods. You may use it with:
python visualize_cam.py --cam-version [CAM version] --image-folder [path to testing images] --image-src [name of the datset]
You will need to download the MS COCO dataset and the meta data.
python visualize_cam.py --cam-version gscorecam --image-folder path_to_coco --image-src coco
The program will prompt you with a question asking if you would like to go for specific class or random class, you could simply tpye the class name or press enter for random classes.
After the class is chosen, the script will then ask for a prompt:
For example, I want to see if the model can react to heart
. Simply type heart
and then enter. After a while, you will see:
On the left is the original image, the right image is the heatmap of the model overlap on the original image.
Instead of runing on a specific dataset, you could run on any folder that only contain images:
python visualize_cam.py --cam-version gscorecam --image-src name_for_image_folder --image-folder path_to_image_folder
The interative script will be the same as above.
In order to use the evaluation code, you will need to download the meta data from Google Drive. We extract the metat data of IamgeNetv2, COCO, and PartImageNet into .hdf5
format for convenience.
You may run the evalution code with the following command:
python evaluate_cam.py info-ground-eval --model-name RN50x16 --cam-version gscorecam --image-src coco --image-folder path_to_image --meta-file --is_clip meta_data/coco_val_instances_stats.hdf5
You may need to change the path accordingly.
Similar to COCO evaluation, simply run:
python eval_partsImageNet.py info-ground-eval --model-name RN50x16 --cam-version gscorecam --image-src coco --image-folder path_to_image --meta-file meta_data/partsImageNet_parts_test.hdf5
To evaluate ImageNetv2, we use Choe et al's evaluation script directly. Please first clone this repo and then follow their data preparation instruction to download and prepare the data. We use this script provided in their repo, you may run the script as follows:
cd wsolevaluation
./dataset/prepare_imagenet.sh
Then you can evaluate on these heatmaps with Choe et al.'s evaluation script:
python evaluation.py --scoremap_root {FOLDER_OF_HEATMAPS} --dataset_name imagenet
To generate heatmaps from ImageNetv2 evaluation above, make sure you are under gScoreCAM
folder. Then you may get the heatmap with the following command:
python wsol_compute_heatmap.py main --model RN50x16 --method gscorecam --dataset imagenet --is-clip