DepictQA: Depicted Image Quality Assessment with Vision Language Models

🌏 Project Page • 🤗 Demo (coming) • 📀 Datasets ( huggingface / modelscope )

Official pytorch implementation of the papers:

DepictQA-Wild (DepictQA-v2): paper, project page.

Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue, "Descriptive Image Quality Assessment in the Wild," arXiv preprint arXiv:2405.18842, 2024.
DepictQA-v1: paper, project page.

Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, Chao Dong, "Depicting beyond scores: Advancing image quality assessment through multi-modal language models," ECCV, 2024.

Update

📆 [Coming soon] Online demo.

📆 [2024.07] DepictQA datasets were released in huggingface / modelscope.

📆 [2024.07] DepictQA-v1 was accepted to ECCV 2024.

📆 [2024.05] We released DepictQA-Wild (DepictQA-v2): a multi-functional in-the-wild descriptive image quality assessment model.

📆 [2023.12] We released DepictQA-v1, a multi-modal image quality assessment model based on vision language models.

Installation

Create environment.

# clone this repo
git clone https://github.com/XPixelGroup/DepictQA.git
cd DepictQA

# create environment
conda create -n depictqa python=3.10
conda activate depictqa
pip install -r requirements.txt

Download pretrained models.
- CLIP-ViT-L-14. Required.
- Vicuna-v1.5-7B. Required.
- All-MiniLM-L6-v2. Required only for confidence estimation of detailed reasoning responses.
- Our pretrained delta checkpoint (see Models). Optional for training. Required for demo and inference.
Ensure that all downloaded models are placed in the designated directories as follows.
```
|-- DepictQA
|-- ModelZoo
    |-- CLIP
        |-- clip
            |-- ViT-L-14.pt
    |-- LLM
        |-- vicuna
            |-- vicuna-7b-v1.5
    |-- SentenceTransformers
        |-- all-MiniLM-L6-v2
```
If models are stored in different directories, revise config.model.vision_encoder_path, config.model.llm_path, and config.model.sentence_model in config.yaml (under the experiments directory) to set new paths.
Move our pretrained delta checkpoint to a specific experiment directory (e.g., DQ495K, DQ495K_QPath) as follows.
```
|-- DepictQA
    |-- experiments
        |-- a_specific_experiment_directory
            |-- ckpt
                |-- ckpt.pt
```
If the delta checkpoint is stored in another directory, revise config.model.delta_path in config.yaml (under the experiments directory) to set new path.

Models

Training Data	Tune	Hugging Face	Description
DQ-495K + Q-Instruct	LORA	download	Trained on DQ-495K and Q-Instruct (see paper) datasets. Able to complete multiple-choice, yes-or-no, what, how questions, but degrades in assessing and comparison tasks.
DQ-495K + Q-Pathway	LORA	download	Trained on DQ-495K and Q-Pathway (see paper) datasets. Performs well on real images, but degrades in comparison tasks.
DQ-495K	LORA	download	Trained on DQ-495K dataset. Used in our paper.

Demos

Online Demo

We provide an online demo (coming soon) deployed on huggingface spaces.

Gradio Demo

We provide a gradio demo for local test.

cd a specific experiment directory: cd experiments/a_specific_experiment_directory
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.
Launch controller: sh launch_controller.sh
Launch gradio server: sh launch_gradio.sh
Launch DepictQA worker: sh launch_worker.sh id_of_one_gpu

You can revise the server config in serve.yaml. The url of deployed demo will be http://{serve.gradio.host}:{serve.gradio.port}. The default url is http://0.0.0.0:12345 if you do not revise serve.yaml.

Note that multiple workers can be launched simultaneously. For each worker, serve.worker.host, serve.worker.port, serve.worker.worker_url, and serve.worker.model_name should be unique.

Datasets

Source codes for DQ-495K (used in DepictQA-v2) dataset construction are provided in here.
Download MBAPPS (used in DepictQA-v1) and DQ-495K (used in DepictQA-v2) datasets from huggingface / modelscope. Move the dataset to the same directory of this repository as follows.
```
|-- DataDepictQA
|-- DepictQA
```
If the dataset is stored in another directory, revise config.data.root_dir in config.yaml (under the experiments directory) to set new path.

Training

cd a specific experiment directory: cd experiments/a_specific_experiment_directory
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14 and Vicuna-v1.5-7B are downloaded and (3) their paths are set in config.yaml.
Run training: sh train.sh ids_of_gpus.

Inference

Inference on Our Benchmark

cd a specific experiment directory: cd experiments/a_specific_experiment_directory
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.
Run a specific infer shell (e.g., _infer_A_sdbrief.sh): sh infer_A_sd_brief.sh id_of_one_gpu.

Inference on Custom Dataset

Construct *.json file for your dataset as follows.

[
    {
        "id": unique id of each sample, required, 
        "image_ref": reference image, null if not applicable, 
        "image_A": image A, null if not applicable, 
        "image_B": image B, null if not applicable, 
        "query": input question, required, 
    }, 
    ...
]

cd your experiment directory: cd your_experiment_directory
Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.

Construct your infer shell as follows.

#!/bin/bash
src_dir=directory_of_src
export PYTHONPATH=$src_dir:$PYTHONPATH
export CUDA_VISIBLE_DEVICES=$1

python $src_dir/infer.py \
    --meta_path json_path_of_your_dataset \
    --dataset_name your_dataset_name \
    --task_name task_name \
    --batch_size batch_size \

--task_name can be set as follows.

Task Name	Description
quality_compare	AB comparison in full-reference
quality_compare_noref	AB comparison in non-reference
quality_single_A	Image A assessment in full-reference
quality_single_A_noref	Image A assessment in non-reference
quality_single_B	Image B assessment in full-reference
quality_single_B_noref	Image B assessment in non-reference

Run your infer shell : sh your_infer_shell.sh id_of_one_gpu.

Evaluation

cd the evaluation directory: cd src/eval.

Various evaluation scripts are explained as follows.

Script	Description
`cal_acc_single_distortion.py`	accuracy of single-distortion identification
`cal_acc_multi_distortion.py`	accuracy of multi-distortion identification
`cal_acc_rating.py`	accuracy of instant rating
`cal_gpt4_score_detail_v1.py`	GPT-4 score of detailed reasoning tasks in DepictQA-v1. Treat both prediction and ground truth as assistants, calculate the relative score of prediction over ground truth.
`cal_gpt4_score_detail_v2.py`	GPT-4 score of detailed reasoning tasks in DepictQA-v2. Only treat prediction as an assistant, directly assess the consistency between prediction and ground truth.

Run basic evaluation (e.g., cal_acc_single_distortion.py):
```
python cal_acc_single_distortion.py --pred_path predict_json_path --gt_path ground_truth_json_path
```
Some specific parameters are explained as follows.

For the calculation of accuracy:
- --confidence (store_true): whether to calculate accuracy within various confidence intervals.
- --intervals (list of float, default [0, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1]): the confidence intervals, only valid when --confidence is true.
For the calculation of GPT-4 score:
- --save_path (str, required): *.json path to save the evaluation results including scores and reasons.

Acknowledgement

This repository is based on LAMM. Thanks for this awesome work.

BibTeX

If you find our work useful for your research and applications, please cite using the BibTeX:

@article{depictqa_v2,
    title={Descriptive Image Quality Assessment in the Wild},
    author={You, Zhiyuan and Gu, Jinjin and Li, Zheyuan and Cai, Xin and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
    journal={arXiv preprint arXiv:2405.18842},
    year={2024}
}

@article{depictqa_v1,
    title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
    author={You, Zhiyuan and Li, Zheyuan and Gu, Jinjin and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
    journal={arXiv preprint arXiv:2312.08962},
    year={2023}
}

XPixelGroup / DepictQA

readme