Open renrenzsbbb opened 2 years ago
Thanks for your interest in our work! The dataset used to train our pre-trained model is limited. That means it cannot generalize well to real-world conditions such as different microphone or camera characteristics. We augmented the images and added some noise to the audio, but the model's generalization is still limited. You can possibly try to train this model on a bigger dataset (without emotion labels, just set the labels to zero) and fine-tune it on the emotion dataset. This might lead to a better generalization.
Best, Emre
Thanks for your great work! I use the original wav extracted from flv to test the pretrained model, it will return the good result. However, when I use my original wav, sometime the image is blur and deformed. Can you give me some suggestion to solve it. Thanks in davace.