capstone496 / SpeechSentiments

1 stars 1 forks source link

Dataset Search #10

Open raeqi opened 5 years ago

raeqi commented 5 years ago

Find a dataset include corresponding audio and text content, but also have the emotion lable

raeqi commented 5 years ago

Google Dataset Search --- Audio Emotion Result:

  1. Shared Acoustic Codes Underlie Emotional Communication in Music and Speech - Evidence from Deep Transfer Learning (Datasets)

---SEMAINE (speech)

The SEMAINE corpus (McKeown, Valstar, Cowie, Pantic & Schroder, 2012) was developed specifically to address the task of achieving emotion-rich interactions, and it is adequate for this task as it comprises a wide range of emotional speech. It includes video and speech recordings of spontaneous interactions between human and emotionally stereotyped `characters'. Coutinho & Schuller (2017) used a subset of this database (called Solid-SAL). The Solid-SAL dataset is freely available for scientific research purposes (see http://semaine-db.eu). This repository includes the audio features used in Coutinho & Schuller (2017) (under features/SEMAINE).

https://semaine-db.eu

  1. A Canadian French Emotional Speech Dataset

----The Canadian French Emotional (CaFE) speech dataset contains six different sentences, pronounced by six male and six female actors, in six basic emotions plus one neutral emotion. The six basic emotions are acted in two different intensities.

https://zenodo.org/record/1219621

  1. Arabic Natural Audio Dataset

----This is the first Arabic Natural Audio Dataset (ANAD) developed to recognize 3 discrete emotions: Happy,angry, and surprised.

Eight videos of live calls between an anchor and a human outside the studio were downloaded from online Arabic talk shows. Each video was then divided into turns: callers and receivers. To label each video, 18 listeners were asked to listen to each video and select whether they perceive a happy, angry or surprised emotion. Silence, laughs and noisy chunks were removed. Every chunk was then automatically divided into 1 sec speech units forming our final corpus composed of 1384 records.

https://data.mendeley.com/datasets/xm232yxf7t/1 https://search.datacite.org/works/10.17632/xm232yxf7t.1

  1. BUILDING E-PET - COULD EMOTIONS PERSONAL TRAINER BECOME A REALITY?

---Nowadays, ubiquitous computing - devices all around us - create a new reality to cope with. This artificial environment is subtle added to the real world and feed us with huge quantities of information pushing to the limits our adapting capacity. We need a high speed information processing. For the human brain the fastest solution is offered by affective neural system. Emotions have an unconscious evaluative role, but the offered solutions are not always functional. It is the moment when personal emotional state could unconsciously migrate to a dysfunctional zone. Thus, the following question arise: "How can we handle an emotional state we are not conscious of?" In this article we propose E-PET -Emotion PErsonal Trainer, a ubiquitous system for the detection and interpretation of emotions. It is a hardware free concept since it is projected to be incorporated in different types of already existing devices such as: smart phones, intelligent goggle and / or smart watches. Information about emotional state is taken from audio channel (words and paralanguage parameters) because this approach requires no additional equipment. The system provides warnings and recommendations in order to help the person to handle its emotional reactions. Moreover, we analyze the opportunity of the above mentioned system, from the potential users' point of view. An investigative research is conducted to find out the key factors for a successful implementation of E-PET. The results encourage us to continue our research in this field and to promote a new technological view on handling [self-] emotions.

https://search.datacite.org/works/10.12753/2066-026x-15-084

  1. JL corpus Emotional speech corpus with primary and secondary emotions.

----For further understanding the wide array of emotions embedded in human speech, we are introducing an emotional speech corpus. In contrast to the existing speech corpora, this corpus was constructed by maintaining an equal distribution of 4 long vowels in New Zealand English. This balance is to facilitate emotion related formant and glottal source feature comparison studies. Also, the corpus has 5 secondary emotions along with 5 primary emotions. Secondary emotions are important in Human-Robot Interaction (HRI), where the aim is to model natural conversations among humans and robots. But there are very few existing speech resources to study these emotions,and this work adds a speech corpus containing some secondary emotions.

https://www.kaggle.com/tli725/jl-corpus

raeqi commented 5 years ago

Google Dataset Search --- Emotion Video Result:

  1. A MUSIC VIDEO RECOMMENDER SYSTEM BASED ON EMOTION CLASSIFICATION ON USER COMMENTS

---Along with the concept of collaborative intelligence, user comments are a useful information source for adding more value on online resources, such as music, video, books and other multimedia resources. While several works have been conducted to utilize user comments for sentimental analysis, i.e., LIKE or UNLIKE, there are still few exploitations of such comments for detecting emotion (user mood) on online resources. With emotion recognition, it is possible for us to understand the content of online resources and use the recognition result for further value added service, such as product recommendation. This thesis proposes a two-step method to perform emotion classification using user comments and utilize the result for music video recommendation.In the first step, the emotion filtering tags user comments with three label types of emotional comments, non-emotional comments, and unrelated junk comments. As the second step, the emotion classification aims to classify the emotional comments into six emotion types, including anger, disgust, fear, happiness, sadness, and surprise. With the YouTube API, the total of 85 video clips with 12,000 comments are collected and used for emotion filtering and classification. The emotion filtering detects that 5,345 comments are emotional comments and the emotion classification categorizes them into six emotional classes using 7,722 features (word types) extracted from the dataset.

https://search.datacite.org/works/10.14457/tu.the.2015.163

  1. Videos for the manipulation of leader moral anger expressions

https://data.mendeley.com/datasets/74z32ymgwc/1

  1. Eliciting positive, negative and mixed emotional states: A film library for affective scientists

---

We describe the creation of a film library designed for researchers interested in positive (amusing), negative (repulsive), mixed (amusing and repulsive) and neutral emotional states. Three hundred 20- to 33-second film clips videotaped by amateurs were selected from video-hosting websites and screened in laboratory studies by 75 female participants on self-reported amusement and repulsion (Experiments 1 and 2). On the basis of pre-defined cut-off values, 51 positive, 39 negative, 59 mixed and 50 neutral film clips were selected. These film clips were then presented to 411 male and female participants in a large online study to identify film clips that reliably induced the target emotions (Experiment 3). Depending on the goal of the study, researchers may choose positive, negative, mixed or neutral emotional film clips on the basis of Experiments 1 and 2 or Experiment 3 ratings.

https://explore.openaire.eu/search/other?orpId=r37980778c78::5c2a429f0b1eac0682e3094590c96265

--Comment: resource not available

raeqi commented 5 years ago

Google Dataset Search --- Speech Emotion Result:

  1. Reusing Neural Speech Representations for Auditory Emotion Recognition

--Acoustic emotion recognition aims to categorize the affective state of the speaker and is still a difficult task for machine learning models. The difficulties come from the scarcity of training data, general subjectivity in emotion perception resulting in low annotator agreement, and the uncertainty about which features are the most relevant and robust ones for classification. In this paper, we will tackle the latter problem. Inspired by the recent success of transfer learning methods we propose a set of architectures which utilize neural representations inferred by training on large speech databases for the acoustic emotion recognition task. Our experiments on the IEMOCAP dataset show ~10% relative improvements in the accuracy and F1-score over the baseline recurrent neural network which is trained end-to-end for emotion recognition.

https://explore.openaire.eu/search/publication?articleId=od________18::0a3cf24abc828e1aa5aad2b7d3fac15b

KatJHuang commented 5 years ago

In regard to the search result right above, the IEMOCAP dataset looks very pertinent to our task at hand: https://sail.usc.edu/iemocap/index.html

It contains speakers enacting or improvising different scripts (meaningful expressions) that were intended to draw out emotions.

KatJHuang commented 5 years ago

Another example is EmotiW: https://cs.anu.edu.au/few/ChallengeDetails.html It's also got speech in natural contexts.

Update: these guys only offer data in video format. But we can extract audio from video using: ffmpeg -i input-video.avi -vn -acodec copy output-audio.aac We'd also need to email them for the dataset and fill out a consent form: