aimmemotion / EmoVIT

[CVPR 2024] EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
14 stars 0 forks source link

EmoVIT

Official code for the paper "EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning" | CVPR 2024

EmoSet/
|
+--LAVIS
|
+--emo
    |
    +--annotation (Results of EmoSet decompression.)
    |
    +--cap-ano (Create the folders required for program execution before running it.)
    |
    +--caption (Create the folders required for program execution before running it.)
    |
    +--reasoning (Create the folders required for program execution before running it.)
    |
    +--conversation_new100 (Create the folders required for program execution before running it.)
    |
    +--prompt
    |
    +--image
        +--amusement (Results of EmoSet decompression)
        |
        +--anger (Results of EmoSet decompression)
        |
        .
        .
        .
        |
        +--train_image (EmoVIT does not need all photos; place the photos required for training here.)
                |
                ........

You can find two main folders in our project structure: emo and LAVIS.

Install Related Packages

conda create --name emovit python=3.8
conda activate emovit
cd emovit
pip install -r requirements.txt

Install LAVIS

pip install salesforce-lavis
# If not work, please proceed as follows.
cd ..
git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e . # Please remove 'open3d' from the 'requirements.txt' file to avoid version conflicts.
# Cut the 'lavis' folder and paste it into the 'lib' folder.

Emotion Instruction Data Generation

  1. Run python ./emo/caption.py to obtain image captions. Select the 'path' based on the class to be processed.
  2. Run python ./emo/cap-anno.py to write the attributes and captions of the image into a file. Select the 'path' based on the class to be processed.
  3. Run python ./emo/gpt4_reasoning.py or python ./emo/gpt4_conversation.py to instruct GPT-4 to generate questions using the above file as input data.
    • Remember to change the key.
    • If you wish to adjust the prompt, you can go to the 'prompt' folder.
  4. Run python ./emo/all.py to integrate the results of reasoning, conversation, and classification.

Following these steps, you can create instructions. If you want to skip this step, you can use the instructions we created using EmoSet. (However, image data must still be downloaded from EmoSet's official website.)

The generation method of categorical data does not need to rely on GPT for creation; it can be directly produced (you can observe the prompt in all.py).

Train EmoVIT

Prepare Weights

You can obtain the weights for Vicuna from this page. We are using version 1.1. Place the downloaded file into LAVIS/lavis/weight/vicuna-7b-2/.

Run

Training

cd LAVIS
python train.py --cfg-path FT.yaml

Parameter Settings

Inference EmoVIT

If you haven't trained your own weights yet, you can use the model_weights1.pth provided in the LAVIS folder.

python ./LAVIS/test.py  

Citation

If you found this paper is helpful, please consider cite our paper:

@inproceedings{Xie2024EmoVIT,
  title={EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning},
  author={Hongxia Xie and Chu-Jun Peng and Yu-Wen Tseng and Hung-Jen Chen and Chan-Feng Hsu and Hong-Han Shuai and Wen-Huang Cheng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}