ZebangCheng / Emotion-LLaMA

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
BSD 3-Clause "New" or "Revised" License
92 stars 8 forks source link

MERR audio prompt #17

Open xiaoyaoxinyi opened 4 days ago

xiaoyaoxinyi commented 4 days ago

Hello author, I would like to know what prompt you used for extracting audio information with qwen-audio. Thank you.

ZebangCheng commented 1 day ago

Sorry, our server was attacked by a mining virus some time ago, and I forgot to back up the script code related to Qwen-Audio. I can only describe the prompt for extracting audio descriptions from memory.

Initially, we used a relatively simple prompt: You are a voice emotion expert. Please analyze the input audio and tell me the tone or pitch of the speaker in the audio.

However, we found that this prompt produced overly simplistic outputs, with Qwen-Audio tending to respond with 'positive' or 'negative.' After several attempts, we finally used the following prompt: You are a voice emotion expert. Please analyze the input audio and determine the tone of the speaker in the video from the following options: [joyful, sad, shocked, fearful, angry, positive, negative, calm, doubtful, dismissive].

Therefore, I suggest you try different prompts to generate the best descriptions.