Fine-tuning Emotion-LLaMA on new dataset

ZebangCheng / Emotion-LLaMA

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

BSD 3-Clause "New" or "Revised" License

119 stars 12 forks source link

Hello author. We would like to fine-tune the method proposed in your paper on a new dataset. I now have extracted features from audio and visual. We currently only need to implement the emotion recognition function. I have the following questions:

(1) Is it sufficient to only train Stage 1 and run the following program? CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc-per-node 4 train.py --cfg-path train_configs/Emotion-LLaMA_finetune.yaml (2) In the Readme's training section, it says "In the dataset configuration file, select the use of MERR_coarse_grained.txt." However, the project prompts "The main branch of Emotion-LLaMA does not contain the path minigpt4\configs\datasets\firstface\featureface.yaml." (3) What is the format of the MERR_coarse_grained.txt file? I would appreciate your reply.

(1) I am not sure about the size of your dataset. If your focus is solely on the emotion recognition task, you can fine-tune your dataset using the pre-trained checkpoint (checkpoint_best.pth).

(2) The MERR_coarse_grained.txt file is also available in our Google Drive:

MERR_coarse_grained.txt

(3) The MERR_coarse_grained.txt file contains the following information for each video sample: the video name, the number of frames (related to our previous work, not directly applicable to this project), and the video label.

You can refer to the project overview in Overview.md for more details:

Project Overview

ZebangCheng / Emotion-LLaMA

Fine-tuning Emotion-LLaMA on new dataset #25