This is the official implementation of the CVPR2024 paper Prompt Highlighter: Interactive Control for Multi-Modal LLMs.
Control text generation by highlighting your prompt! Prompt Highlighter is a training-free inference pipeline that facilitates token-level user interactions for a customized generation. Our method is compatible with both LLMs and VLMs.
20231130
LLaMA attention modification & LLaVA descriptive task inference.20231130
Test data & mask upload.20231201
LLaVA highlighter benchmark test inference (MMBench & MME)20231201
LLaVA partial highlight inference20231202
Vicuna (LLM) partial highlight inference20231202
InstructBLIP partial highlight inference20231204
Current Code Release!TBD
InternLM-VLComposer benchmark test inferenceBasic enviornment setup:
conda create -n highlighter python=3.10 -y
conda activate highlighter
pip install -r requirements.txt
Install latest LLaVA model 2023-11-30
in base_models. If you already have one, you can use the installed one in your own enviornment.
# you may also use your installed llava if you have installed.
cd base_models
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install --upgrade pip # enable PEP 660 support
pip install -e .
Model Download: Please refer to LLaVAv1.5 Model Zoo to get the base pretrained model.
Partial Highlighting task: We provide examples in assets/test_data/questions_descriptions.json
, you may add your new case to test our method.
python examples/llava_test.py
Descriptive task (highlighting all input contexts): We provide examples in assets/test_data/questions_descriptions.json
, you may add your new case to test our method.
python examples/llava_descriptions.py
We will also provide a script for descriptive COCO caption generation (TODO here).
If you want to add your customized data, please provide a squared image that uses a darker (uint color < 128) marked region as the highlighter area. Add your case to the JSON file.
Benchmark Test: Please refer to evaluation data to get your benchmark dataset (MMBench & MME). Benchmark result:
Method | MME-perception | MMBench-dev | MMBench-test |
---|---|---|---|
Baseline (LLaVAv1.5-13B) | 1531.3 | 67.7 | 67.0 |
Ours (Official Reported) | 1552.5 | 69.7 | 69.5 |
Ours (This Repo) | 1552.5 | 70.1 | 70.7 |
For MMBench, you may change your hyper-params in the following script and run:
bash examples/eval_scripts/mmbench_dev_hl.sh
bash examples/eval_scripts/mmbench_test_hl.sh
For MME:
bash examples/eval_scripts/mme_hl.sh
You may found the evaluated metric at base_models/LLaVA/playground/data/eval/MME/eval_tool/answers/llava-v1.5-13b-hl-1.3-2.0-0.01/eval.log
We provide a script to test the partial highlighter of the pure language input. Download the Vicuna model. We use the version Vicuna-13B-v1.1. You may change to any llama-based LLMs. In this case, you will also need to change the conversation prompt template. Please follow the instructions to - install the LLaVA in the base_model. If you have already installed the LLaVA, you may directly test with the script:
python examples/llama_test.py \
--txt "Please write a summary of A Mid-Summer Nights' Dream, make it compact." \
--hl "make it compact."
Here you may change your input prompt and highlighted segments by passing --txt
and --hl
, respectively. If you want to pass multiple highlighted segments, you may use a <s>
to split them. For example, you can pass --hl "write a summary<s>make it compact."
to highlight multiple requirements.
Install the latest LAVIS 2023-11-30
in base_models. If you already have one, you can use the installed one in your own environment.
To run the InstructBLIP-Vicuna, you need to add the LLM path (vicuna-13b v1.1) to the key llm_model
in the configuration file base_models/LAVIS/lavis/configs/models/blip2/blip2_instruct_vicuna13b.yaml
.
# Please install with your highlighter env activated.
cd base_models
git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e .
Partial Highlighting task: Run examples in assets/test_data/questions_descriptions.json
, you may add your new case to test our method.
Note: Here, we only implement a highlighting mechanism in the QFormer. We may update a hybrid highlighting (visual & text token) version in the future.
python examples/instructblip_test.py
TBD.