RoboTutorLLC / RoboTutor_2019

Main code for RoboTutor. Uploaded 11/20/2018 to XPRIZE from RoboTutorLLC/RoboTutor.
Other
7 stars 4 forks source link

Elicit kids' facial expressions #315

Closed JackMostow closed 6 years ago

JackMostow commented 6 years ago

Collect kids’ facial expressions elicited by showing them video clips and asking them to make the same face, for an experiment to evaluate automated expression detection. From a 5/7/2018 meeting with Amogh and Laszlo as a quick way to get labelled data from kids who use RoboTutor, to test an expression detector on.

JackMostow commented 6 years ago

Amy, 5/8/18 1:20 PM:

Hmm... that's an interesting idea, but I'm not clear how it would work.

Asking them to make a face after watching a video would mean that they would have to decode the affective state seen in the clip, and we wouldn't know what they had decoded it as so we wouldn't be able to assign an affective state to the result. In other words, they might see a video we thought was 'bored', but they interpret as 'engaged', and then they produce an 'engaged' face.

In that case, you would be better off saying, show me your "bored" face because at least you would know that they were interpreting the request as 'bored', and you would have more control over the result.

But I think the research would suggest that it's better to just use a cultural native to label natural expressions rather than faked expressions.

JackMostow commented 6 years ago

Jack, 5/8/18 2:33pm:

  1. The video clips are very brief. The reason to use them instead of still photos is that the facial action units (FAUs) that constitute facial expressions correspond to muscle movements.

  2. The assumption that the “make this face” task requires imputing an emotional state sounds plausible, but is it true?

  3. I plan to use both elicitation methods – all the verbal stimuli first, so as to keep the video stimuli from biasing the responses. They have different pros and cons and will be interesting to compare:

a. Saying which emotion to display has the advantage you cite, but its wording is crucial and may be problematic. E.g. a young kid might not know the word “frustrated” or its correct meaning. I’ll ask our Swahili experts.

b. Video stimuli to imitate bypass the need for verbal descriptions that kids might not understand, or understand differently than we intend.

  1. I prefer to focus for now on detecting facial expressions and treat imputation of emotional state as a separate culture-dependent task, though we hope the verbally elicited expressions will shed light on it.

a. Amogh will extract the stimuli from a data set of elicited facial expressions labeled by FAUs. For each expression, he’ll pick two stimuli – one a male, the other female – with the exact FAUs associated with that expression in the literature.

b. Natural expressions are authentic, but their extremely skewed distribution would give us a very unbalanced data set. Elicited expressions are less authentic but will let us quickly elicit balanced data for “bored,” “confused,” “delighted,” “frustrated,” “surprised,” and “neutral” expressions, in scare quotes to mark them as conventional labels rather than imputed emotions.

JackMostow commented 6 years ago

I used Amogh's data collection program to collect kids' facial expressions both ways in 3 settings:

  1. A private home where kids use RoboTutor and already knew me and were comfortable enough to crawl all over me. You'll see 2-3 faces (including) mine in most of the videos in addition to the ostensible subject.

  2. The school library where the kids were using RoboTutor for the first time and didn't know me. Many or most of them displayed no emotion that I could discern, whether because they ignored Leonora's video prompt, didn't understand the task, or felt too inhibited.

  3. Fortunatus' home where kid use RoboTutor but didn't know me, though they seemed fairly comfortable.

I got more fluent over time, but often turned off video recording after rather than before changing stimuli.

I've been trying in Tanzania to upload the hundreds of video clips to dated folders in FROM TABLETS on GDrive, but so far without success, so it may have to wait till I return on May 28.

Amy says we should get a native Tanzanian to label the emotions displayed, but it's moot unless we can detect the expressions.

JackMostow commented 6 years ago

Jack recorded May 14-17 in Bagamoyo.