SuperKogito / SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.
https://superkogito.github.io/SER-datasets
MIT License
296 stars 40 forks source link
audio audio-datasets datasets emotions emotions-recognition multimodal-emotion-recognition speech speech-emotion-recognition
Speech Emotion Recognition (SER) Datasets: A collection of datasets (count=77) for the purpose of emotion recognition/detection in speech. The table is chronologically ordered and includes a description of the content of each dataset along with the emotions included. The table can be browsed, sorted and searched under https://superkogito.github.io/SER-datasets/ Dataset Year Content Emotions Format Size Language Paper Access License
nEmo 2024 3 hours of samples recorded with the participation of nine actors. 6 emotions: anger, fear, happiness, sadness, surprised, and neutral. Audio 0.434 GB Polish nEMO: Dataset of Emotional Speech in Polish Open CC BY 4.0
MDER 2024 2000 voice records of people speaking Moroccan dialect. 5 emotions: Neutral, Happy, Sad, Angry and Fearful. Audio 0.187 GB Arabic Moroccan -- Open CC BY 4.0
EMOVOME 2024 999 spontaneous voice messages from 100 Spanish speakers, collected from real conversations on a messaging app. Valence & arrousal dimensions and 7 emotions: happiness, disgust, anger, surprise, fear, sadness, and neutral. Audio -- Spanish EMOVOME Database: Advancing Emotion Recognition in Speech Beyond Staged Scenarios Partially open CC BY 4.0
EMNS 2023 1206 high quality labeled utterances by one female speaker (2-3 hours). Anger, excitement, disgust, happiness, surprise, sadness, and neutral (plus sarcasm) Audio 0.042 GB English (British) EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels Open Apache 2.0
CAVES 2023 Full hd visual recordings of 10 native cantonese speakers uttering 50 sentences. Anger, happiness, sadness, surprise, fear, disgust and neutral Audio 47 GB Chinese (cantonese) A Cantonese Audio-Visual Emotional Speech (CAVES) dataset Open Available for research purposes only
BANSpEmo 2023 792 utterance recordings from 22 unprofessional speakers (11 males and 11 females) of six basic emotional reactions of two sets of sentences. angry, disgusted, happy, surprised, sad, fear Audio 0.555 GB Bangla BANSpEmo: A Bangla Emotional Speech Recognition Dataset Open CC BY 4.0
KBES 2023 900 audio signals from 35 actors (20 females and 15 males). Each emotion is represented with two intensity levels (low & high) angry, disgusted, happy, neutral, sad Audio 0.337 GB Bangla KBES: A dataset for realistic Bangla speech emotion recognition with intensity level Open CC BY 4.0
RESD 2022 Russian emotional speech dialogue dataset ~3.5 hours of actor-voiced dialogues, each ~3 minutes long, with speech files (16000 or 44100Hz), with speech-to-text transcripts anger, disgust, fear, enthusiasm, happiness, neutral, sadness Audio 0.48 GB Russian EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark Open MIT
Hi, KIA 2022 A shared short Wakeup Word database focusing on perceived emotion in speech The dataset contains 488 Wakeup Word speech angry, happy, sad, neutral Audio 0.75 GB Korean Hi, KIA: A Speech Emotion Recognition Dataset for Wake-Up Words Open CC BY-SA 4.0
Emozionalmente 2022 6902 labeled samples acted out by 431 amateur actors while verbalizing 18 different sentences anger, disgust, fear, joy, sadness, surprise, neutral Audio 0.581 GB Italian -- Open CC BY 4.0
BanglaSER 2022 1467 Bangla speech-audio recordings by 34 non-professional participating actors (17 male and 17 female) from diverse age groups between 19 and 47 years. angry, happy, neutral, sad, surprise Audio 0.425 GB Bangla BanglaSER: A speech emotion recognition dataset for the Bangla language Open CC BY 4.0
B-SER 2022 1224 speech-audio recordings by 34 non-professional participating actors (17 male and 17 female) from diverse age groups between 19 and 47 years. angry, happy, sad and surprise Audio 0.363 GB Bangla -- Open CC BY 4.0
Kannada 2022 468 audio samples, six different sentences, pronounced by thirteen people (four male and nine female), in five basic emotions plus one neutral emotion Anger, Sadness, Surprise, Happiness, Fear, Neutral Audio 0.1661 GB Kannada -- Open CC BY 4.0
Quechua-SER 2022 12420 audio recordings (~15 hours) and their transcriptions by 7 native speakers. Emotional labels using dimensions: valence, arousal, and dominance. Audio 3.53 GB Quechua Collao A speech corpus of Quechua Collao for automatic dimensional emotion recognition Open CC BY 4.0
MESD 2022 864 audio files of single-word emotional utterances with Mexican cultural shaping. 6 emotions provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness. Audio 0.097 GB Spanish (Mexican) The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning Open CC BY 4.0
SyntAct 2022 Synthesized database with 997 utterances of three basic emotions and neutral expression based on rule-based manipulation for a diphone synthesizer which we release to the public 6 emotions: angry, bored, happy, neutral, sad and scared Audio 0.941 GB German SyntAct: A Synthesized Database of Basic Emotions Open CC BY-SA 4.0
BEAT 2022 76-Hour and 30-Speaker of 4 different languages: English (60h), Chinese (12h), Spanish (2h) and Japanese (2h). 8 emotions: happiness, anger, disgust, sadness, contempt, surprise, fear, and neutral Audio, Video 42 GB English, Chinese, Spanish, Japanese A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis Open Non-commercial license
Dusha 2022 300000 audio recordings (~350 hours) of Russian speech, their transcripts and emotiomal labels. The dataset has two subsets: acted and real-life 4 emotions: angry, happy, sad and neutral. Arousal and valence metrics are also available. Audio 58 GB Russian Large Raw Emotional Dataset with Aggregation Mechanism Open Public license with attribution and conditions reserved
MAFW 2022 10045 video-audio clips in the wild. 11 single-label emotion categories (anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment) and 32 multi-label emotion categories. Audio, Video -- -- MAFW: A Large-scale, Multi-modal, Compound Affective Database for Dynamic Facial Expression Recognition in the Wild Restricted Non-commercial research purposes
EMOVIE 2021 9724 samples with audio files and its emotion human-labeled annotation. Polarity metrics (positive:+1, negative:-1) Audio 0.572 GB Chinese (Mandarin) EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model Open CC BY-NC-SA 2.0
emoUERJ 2021 Ten sentences from eight actors, equally divided between genders, and they were free to choose the phrases for record audios in four emotions (377 audios). happiness, anger, sadness or neutral Audio 0.1051 GB Portuguese (Brazilian) -- Open CC BY 4.0
Thorsten-Voice Dataset 2021.06 emotional 2021 2400 normalized mono recordings by one person (Thorsten Müller) representing 300 sentences. Amusement, Disgust Anger, Suprise and Neutral (plus drunk, whispering and sleepy states) Audio 0.399 GB German -- Open CC0: Public Domain
ASED 2021 2474 recordings by 65 participants (25 females and 40 males)). Recordings were judged and rejected according to the opionion of eight judges. Five emotions: anger, happiness, fear, sadness and neutral Audio 0.135 GB Amharic A New Amharic Speech Emotion Dataset and Classification Benchmark Open --
ESCorpus-PE 2021 Spanish peruvian speech gathered from Spanish interviews, TV reports, political debate and testimonials. It contains 3749 utterances, 80 speakers (44 male and 36 female), created from Youtube audios Valence, Arousal and Dominance Audio 1.9 GB Spanish (Peruvian) -- Open CC BY-SA 4.0
SUBSECO 2021 7000 sentence-level utterances of the Bangla language, 20 professional actors (10 males and 10 females), recordings, 10 sentences for 7 target emotions. Anger, Disgust, Fear, Happiness, Neutral, Sadness and Surprise Audio 1.7 GB English SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla Open CC BY 4.0
Audio-Speech-Sentiment 2021 Audio Speech Sentiment Dataset 4 emotions provides audio recordings of spoken sentences for anger, happiness, sadness, and neutral emotions. Audio 1.1 GB English -- Open CC0: Public Domain
LSSED 2021 LSSED: A Large-Scale Dataset and Benchmark for Speech Emotion Recognition Anger, happiness, sadness, disappointment, boredom, disgust, excitement, fear, surprise, normal, and other. Audio 90 GB English LSSED: A Large-Scale Spanish Emotional Speech Database for Speech Processing and Machine Learning Restricted -
MLEnd 2021 ~32700 audio recordings files produced by 154 speakers. Each audio recording corresponds to one English numeral (from "zero" to "billion") Intonations: neutral, bored, excited and question Audio 2.27 GB -- -- Open Unknown
ASVP-ESD 2021 ~13285 audio files collected from movies, tv shows and youtube containing speech and non-speech. 12 different natural emotions (boredom, neutral, happiness, sadness, anger, fear, surprise, disgust, excitement, pleasure, pain, disappointment) with 2 levels of intensity. Audio 2 GB Chinese, English, French, Russian and others -- Open Unknown
ESD 2021 29 hours, 3500 sentences, by 10 native English speakers and 10 native Chinese speakers. 5 emotions: angry, happy, neutral, sad, and surprise. Audio, Text 2.4 GB Chinese, English Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset Open Academic License
MuSe-CAR 2021 40 hours, 6,000+ recordings of 25,000+ sentences by 70+ English speakers (see db link for details). continuous emotion dimensions characterized using valence, arousal, and trustworthiness. Audio, Video, Text 15 GB English The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements Restricted Academic License & Commercial License
THAI SER 2021 The recordings are 41 hours, 36 minutes long (27,854 utterances), and were performed by 200 professional actors (112 female, 88 male). 5 main emotions assigned to actors: Neutral, Anger, Happiness, Sadness, and Frustration. Audio 12 GB Thai -- Open CC BY-SA 4.0
French Emotional Speech Database - Oréau 2020 79 utterances with 10 to 13 utterances pro emotion by 32 non-professional speakers. 7 emotions: sadness, anger, disgust, fear, surprise, joy, neutral. Audio 0.264 GB French -- Open CC BY 4.0
Att-HACK 2020 25 speakers interpreting 100 utterances in 4 social attitudes, with 3-5 repetitions each per attitude for a total of around 30 hours of speech. expressive speech in French, 100 phrases with multiple versions (3 to 5) in four social attitudes (friendly, distant, dominant and seductive). Audio 6.6 GB French Att-HACK: An Expressive Speech Database with Social Attitudes Open CC BY-NC-ND 4.0
MSP-Podcast corpus 2020 100 hours by over 100 speakers (see db link for details). This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other). Audio 13.4 GB English The MSP-Conversation Corpus Restricted Academic License & Commercial License
AISHELL-3 2020 Roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers and total 88035 utterances. Neutral Audio 19 GB Chinese (Mandarin) AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines Open Apache 2.0
BEASC 2020 Bangla Emotional Audio-Speech Corpus 6 emotions provides Bangla spoken utterances for anger, happiness, sadness, fear, surprise, and neutral. Audio 9 GB Bangla BEASC: Bangla Emotional Audio-Speech Corpus - A Speech Emotion Recognition Corpus for the Low-Resource Bangla Language Open CC BY 4.0
emotiontts open db 2020 Recordings and their associated transcriptions by a diverse group of speakers. 4 emotions: general, joy, anger, and sadness. Audio, Text -- Korean -- Partially open CC BY-NC-SA 4.0
URDU-Dataset 2020 400 utterances by 38 speakers (27 male and 11 female). 4 emotions: angry, happy, neutral, and sad. Audio 0.072 GB Urdu Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages Open --
BAVED 2020 1935 recording by 61 speakers (45 male and 16 female). 3 levels of emotion. Audio 0.195 GB Arabic -- Open --
VIVAE 2020 non-speech, 1085 audio file by 11 speakers. non-speech 6 emotions: achievement, anger, fear, pain, pleasure, and surprise with 3 emotional intensities (low, moderate, strong, peak). Audio 0.0935 GB Nonverbal (English) The Variably Intense Vocalizations of Affect and Emotion (VIVAE) corpus prompts new perspective on nonspeech perception Restricted CC BY-NC-SA 4.0
VESUS 2019 252 distinct phrases, each read by 10 actors totalling 6 hours of speech. 5 emotions: anger, happiness, sadness, fear and neutral. Audio -- English VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English Restricted Academic EULA
Morgan Emotional Speech Set 2019 999 spontaneous voice messages from 100 Spanish speakers, collected from real conversations on a messaging app. Valence & arrousal dimensions and 4 emotions: happiness, anger, sadness, and calmness. Audio 0.192 GB English Categorical and Dimensional Ratings of Emotional Speech: Behavioral Findings From the Morgan Emotional Speech Set Open CC BY 4.0
PMEmo 2019 Dataset containing emotion annotations of 794 songs as well as the simultaneous electrodermal activity (EDA) signals. A Music Emotion Experiment was well-designed for collecting the affective-annotated music corpus of high quality, which recruited 457 subjects. Valence, Arousal Audio, EDA 1.3 GB Chinese, English The PMEmo Dataset for Music Emotion Recognition Open CC BY-SA 4.0
SEWA 2019 more than 2000 minutes of audio-visual data of 398 people (201 male and 197 female) coming from 6 cultures. emotions are characterized using valence and arousal. Audio, Video -- Chinese, English, German, Greek, Hungarian and Serbian SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild Restricted SEWA EULA
MELD 2019 1400 dialogues and 14000 utterances from Friends TV series by multiple speakers. 7 emotions: Anger, disgust, sadness, joy, neutral, surprise and fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance. Audio, Video, Text 10.1 GB English MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations Open MELD: GPL-3.0 License
ShEMO 2019 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data from online radio plays by 87 native-Persian speakers. 6 emotions: anger, fear, happiness, sadness, neutral and surprise. Audio 0.101 GB Persian ShEMO: a large-scale validated database for Persian speech emotion detection Open --
DEMoS 2019 9365 emotional and 332 neutral samples produced by 68 native speakers (23 females, 45 males). 7/6 emotions: anger, sadness, happiness, fear, surprise, disgust, and the secondary emotion guilt. Audio 2.5 GB Italian DEMoS: An Italian emotional speech corpus. Elicitation methods, machine learning, and perception Restricted EULA: End User License Agreement
AESDD 2018 around 500 utterances by a diverse group of actors (over 5 actors) siumlating various emotions. 5 emotions: anger, disgust, fear, happiness, and sadness. Audio 0.392 GB Greek Speech Emotion Recognition for Performance Interaction Open --
Emov-DB 2018 Recordings for 4 speakers- 2 males and 2 females. The emotional styles are neutral, sleepiness, anger, disgust and amused. Audio 5.88 GB English The emotional voices database: Towards controlling the emotion dimension in voice generation systems Open --
OMG Emotion 2018 420 relatively long emotion videos with an average length of 1 minute, collected from a variety of Youtube channels. 7 emotions:anger, disgust, fear, happy, sad, surprise and neutral. Plus valence, arousal. Audio, Video -- English The OMG-Emotion Behavior Dataset Open CC BY-NC-SA 3.0
RAVDESS 2018 7356 recordings by 24 actors. 7 emotions: calm, happy, sad, angry, fearful, surprise, and disgust Audio, Video 24.8 GB English The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English Open CC BY-NC-SA 4.0
JL corpus 2018 2400 recording of 240 sentences by 4 actors (2 males and 2 females). 5 primary emotions: angry, sad, neutral, happy, excited. 5 secondary emotions: anxious, apologetic, pensive, worried, enthusiastic. Audio 1.9 GB English An Open Source Emotional Speech Corpus for Human Robot Interaction Applications Open CC0 1.0
CaFE 2018 6 different sentences by 12 speakers (6 fmelaes + 6 males). 7 emotions: happy, sad, angry, fearful, surprise, disgust and neutral. Each emotion is acted in 2 different intensities. Audio 2 GB French (Canadian) -- Open CC BY-NC-SA 4.0
EmoFilm 2018 1115 audio instances sentences extracted from various films. 5 emotions: anger, contempt, happiness, fear, and sadness. Audio 0.277 GB English, Italian, Spanish Categorical vs Dimensional Perception of Italian Emotional Speech Restricted EULA: End User License Agreement
ANAD 2018 1384 recording by multiple speakers. 3 emotions: angry, happy, surprised. Audio 2 GB Arabic Arabic Natural Audio Dataset Open CC BY-NC-SA 4.0
EmoSynth 2018 144 audio file labelled by 40 listeners. Emotion (no speech) defined in regard of valence and arousal. Audio 0.1034 GB -- The Perceived Emotion of Isolated Synthetic Audio: The EmoSynth Dataset and Results Open CC BY 4.0
CMU-MOSEI 2018 65 hours of annotated video from more than 1000 speakers and 250 topics. 6 Emotion (happiness, sadness, anger,fear, disgust, surprise) + Likert scale. Audio, Video 190.1 GB English Multi-attention Recurrent Network for Human Communication Comprehension Open CMU-MOSEI License
VERBO 2018 14 different phrases by 12 speakers (6 female + 6 male) for a total of 1167 recordings. 7 emotions: Happiness, Disgust, Fear, Neutral, Anger, Surprise, Sadness Audio -- Portuguese VERBO: Voice Emotion Recognition dataBase in Portuguese Language Restricted Available for research purposes only
CMU-MOSI 2017 2199 opinion utterances with annotated sentiment. Sentiment annotated between very negative to very positive in seven Likert steps. Audio, Video 4.3 GB English Multi-attention Recurrent Network for Human Communication Comprehension Open CMU-MOSI License
MSP-IMPROV 2017 20 sentences by 12 actors. 4 emotions: angry, sad, happy, neutral, other, without agreement Audio, Video 3.4 GB English MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception Restricted Academic License & Commercial License
CREMA-D 2017 7442 clip of 12 sentences spoken by 91 actors (48 males and 43 females). 6 emotions: angry, disgusted, fearful, happy, neutral, and sad Audio, Video 0.607 GB English CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset Open Open Database License & Database Content License
Example emotion videos used in investigation of emotion perception in schizophrenia 2017 6 videos:Two example videos from each emotion category (angry, happy and neutral) by one female speaker. 3 emotions: angry, happy and neutral. Audio, Video 0.063 GB English -- Open Permitted Non-commercial Re-use with Acknowledgment
EMOVO 2014 6 actors who played 14 sentences. 6 emotions: disgust, fear, anger, joy, surprise, sadness. Audio 0.355 GB Italian EMOVO Corpus: an Italian Emotional Speech Database Open --
RECOLA 2013 3.8 hours of recordings by 46 participants. negative and positive sentiment (valence and arousal). Audio, Video -- -- Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions Restricted Academic License & Commercial License
GEMEP corpus 2012 Videos10 actors portraying 10 states. 12 emotions: amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure(sensory), pride, relief, and sadness. Plus, 5 additional emotions: admiration, contempt, disgust, surprise, and tenderness. Audio, Video -- French Introducing the Geneva Multimodal Expression Corpus for Experimental Research on Emotion Perception Restricted --
OGVC 2012 9114 spontaneous utterances and 2656 acted utterances by 4 professional actors (two male and two female). 9 emotional states: fear, surprise, sadness, disgust, anger, anticipation, joy, acceptance and the neutral state. Audio 5.3 GB Japanese Naturalistic emotional speech collectionparadigm with online game and its psychological and acoustical assessment Restricted --
LEGO corpus 2012 347 dialogs with 9,083 system-user exchanges. Emotions classified as garbage, non-angry, slightly angry and very angry. Audio 1.1 GB -- A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System Open License available with the data. Free of charges for research purposes only.
SEMAINE 2012 95 dyadic conversations from 21 subjects. Each subject converses with another playing one of four characters with emotions. 5 FeelTrace annotations: activation, valence, dominance, power, intensity Audio, Video, Text 104 GB English The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent Restricted Academic EULA
SAVEE 2011 480 British English utterances by 4 males actors. 7 emotions: anger, disgust, fear, happiness, sadness, surprise and neutral. Audio, Video -- English (British) Multimodal Emotion Recognition Restricted Free of charges for research purposes only.
TESS 2010 2800 recording by 2 actresses. 7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral. Audio -- English BEHAVIOURAL FINDINGS FROM THE TORONTO EMOTIONAL SPEECH SET Open CC BY-NC-ND 4.0
EEKK 2007 26 text passage read by 10 speakers. 4 main emotions: joy, sadness, anger and neutral. -- 0.352 GB Estonian Estonian Emotional Speech Corpus Open CC-BY license
IEMOCAP 2007 12 hours of audiovisual data by 10 actors in 5 sessions. Full: neutral state; happiness; sadness; anger; surprise; fear; disgust; frustration; excited; other. Balance 5 emotions: happiness, anger, sadness, frustration and neutral. Three dimensions: valence, arousal, dominance Audio, Video, Text 17.7 GB English IEMOCAP: Interactive emotional dyadic motion capture database Restricted IEMOCAP license
Keio-ESD 2006 A set of human speech with vocal emotion spoken by a Japanese male speaker. 47 emotions including angry, joyful, disgusting, downgrading, funny, worried, gentle, relief, indignation, shameful, etc. Audio 0.0435 GB Japanese EMOTIONAL SPEECH SYNTHESIS USING SUBSPACE CONSTRAINTS IN PROSODY Restricted Available for research purposes only.
EMO-DB 2005 800 recording spoken by 10 actors (5 males and 5 females). 7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust. Audio 0.049 GB German A Database of German Emotional Speech Open --
eNTERFACE05 2005 Videos by 42 subjects, coming from 14 different nationalities. 6 emotions: anger, fear, surprise, happiness, sadness and disgust. Audio, Video 0.8 GB German -- Open Free of charges for research purposes only.
DES 2002 4 speakers (2 males and 2 females). 5 emotions: neutral, surprise, happiness, sadness and anger -- -- Danish Documentation of the Danish Emotional Speech Database -- -- ## References

Contribution

Disclaimer

Recommended tools