Official code for the paper "EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning" | CVPR 2024
EmoSet/
|
+--LAVIS
|
+--emo
|
+--annotation (Results of EmoSet decompression.)
|
+--cap-ano (Create the folders required for program execution before running it.)
|
+--caption (Create the folders required for program execution before running it.)
|
+--reasoning (Create the folders required for program execution before running it.)
|
+--conversation_new100 (Create the folders required for program execution before running it.)
|
+--prompt
|
+--image
+--amusement (Results of EmoSet decompression)
|
+--anger (Results of EmoSet decompression)
|
.
.
.
|
+--train_image (EmoVIT does not need all photos; place the photos required for training here.)
|
........
You can find two main folders in our project structure: emo
and LAVIS
.
LAVIS
folder can be obtained from here.conda create --name emovit python=3.8
conda activate emovit
cd emovit
pip install -r requirements.txt
pip install salesforce-lavis
# If not work, please proceed as follows.
cd ..
git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e . # Please remove 'open3d' from the 'requirements.txt' file to avoid version conflicts.
# Cut the 'lavis' folder and paste it into the 'lib' folder.
python ./emo/caption.py
to obtain image captions. Select the 'path' based on the class to be processed.python ./emo/cap-anno.py
to write the attributes and captions of the image into a file. Select the 'path' based on the class to be processed.python ./emo/gpt4_reasoning.py
or python ./emo/gpt4_conversation.py
to instruct GPT-4 to generate questions using the above file as input data.
python ./emo/all.py
to integrate the results of reasoning, conversation, and classification.Following these steps, you can create instructions. If you want to skip this step, you can use the instructions we created using EmoSet. (However, image data must still be downloaded from EmoSet's official website.)
The generation method of categorical data does not need to rely on GPT for creation; it can be directly produced (you can observe the prompt in all.py
).
You can obtain the weights for Vicuna from this page. We are using version 1.1. Place the downloaded file into LAVIS/lavis/weight/vicuna-7b-2/
.
cd LAVIS
python train.py --cfg-path FT.yaml
LAVIS/FT.yaml
: Setting of hyperparametersLAVIS/lavis/configs/models/blip2/blip2_instruct_vicuna7b.yaml
: Select the location of LLM weightLAVIS/lavis/configs/datasets/coco/defaults_vqa.yaml
: Select the location of your data
LAVIS/lavis/runners/runner_base.py (Change the name of the weight file to be saved.)If you haven't trained your own weights yet, you can use the model_weights1.pth
provided in the LAVIS
folder.
python ./LAVIS/test.py
If you found this paper is helpful, please consider cite our paper:
@inproceedings{Xie2024EmoVIT,
title={EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning},
author={Hongxia Xie and Chu-Jun Peng and Yu-Wen Tseng and Hung-Jen Chen and Chan-Feng Hsu and Hong-Han Shuai and Wen-Huang Cheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}