Open lucas0214 opened 1 month ago
Thank you for your interest in our work!
I have a clarification to make: are you concerned about how to extract features from different encoders, or are you looking to generate multimodal samples corresponding to the videos?
Thank you for your interest in our work!
I have a clarification to make: are you concerned about how to extract features from different encoders, or are you looking to generate multimodal samples corresponding to the videos?
I want to get the information from this picture that corresponds to my Video. Actually, I see that your Emotion-Llama can also get the corresponding information from an input video and audio?
Sure, I understand your question. In our paper, we discussed prompts related to extracting visual information using MiniGPT-v2 and audio information with Qwen-Audio. You can try setting up the corresponding environment for these projects to generate the multimodal information related to your data. Once I finalize the scripts, I will make them open source.
"We constructed the MERR dataset, which includes 28,618 coarse-grained and 4,487 finegrained annotated samples, covering a wide range of emotional categories such as “doub” and “contempt”. Unlike previous datasets, MERR’s diverse emotional contexts allow models to learn from varied scenarios and generalize to real-world applications, serving as a valuable resource for advancing large-scale multimodal emotion model training and evaluation." This is one of the main contributions of your work. Could you provide the script code for building the MERR dataset? I would like to process my own dataset into the MERR format. Thank you!!!
We are currently organizing our code and plan to make everything open source. The script for building the MERR dataset involves multiple projects, and we need to consolidate everything before releasing it. Please bear with us for a moment.
Hello, great work!!! Could you please provide a script to transform my personal dataset into the MERR data format?