ZebangCheng / Emotion-LLaMA

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
BSD 3-Clause "New" or "Revised" License
130 stars 11 forks source link

About 2-stage training #21

Open Archie1121 opened 1 month ago

Archie1121 commented 1 month ago

Hello, author! In your paper, you mentioned that you used a two-stage training strategy, consisting of coarse and fine-grained JSON files. Regarding the code, I want to ask whether I should replace the "ann_path" in featureface.yaml for training. If I replace the coarse-grained JSON path with the fine-grained JSON path, how can the .pth file from stage 1 load successfully?

ZebangCheng commented 1 month ago

If you executed the Stage 1 pre-training code, the model parameter files will be saved in the checkpoints/save_checkpoint directory. Use the model parameter file from the last epoch and set the path in the configuration file:

# Set Emotion-LLaMA path
ckpt: "/home/czb/project/Emotion-LLaMA/checkpoints/save_checkpoint/2024xxxx-v1/checkpoint_29.pth"

In simple terms, it means replacing the original minigptv2_checkpoint.pth weights with the weights from the first stage of training.

Archie1121 commented 1 month ago

Thans a lot!By the way, in your paper, you mentioned that you fine-tuned on the DFEW dataset. For this fine-tuning, did you continue training with the emotion-llama model obtained from the first stage, the emotion-llama model from the second stage, or did you re-train it as you did with the MERR dataset? Additionally, for the DFEW dataset, besides extracting the corresponding features, do you also need to calculate and identify AUs to obtain coarse-grained or fine-grained labels?

ZebangCheng commented 1 month ago

You're welcome! We fine-tuned the Emotion-LLaMA model from the first stage on the DFEW dataset. Specifically, we tested the zero-shot scores for DFEW using the model parameters saved from the first stage, and selected the pthwith the highest scores for fine-tuning in the second stage.

For the DFEW dataset, we only need to extract the relevant features; there is no requirement to calculate or identify AUs, as DFEW focuses solely on the emotion recognition task and does not involve emotional reasoning tasks.