dmsfabiano / omg-empathy

1 stars 0 forks source link

Fusion technique #5

Open dmsfabiano opened 6 years ago

dmsfabiano commented 6 years ago

I've been putting thought on how to fuse the images with the audio. I am not sure yet on how we would fuse the images directly with the audio.

However, if we can compute some kind of metric from the images (i.e. face Landmarks) we can use a linear math fusion technique that I developed last semester. It's shown really good results, but we would have to see how it works with this data.

Please share thoughts on this, and/or other fusion techniques

scanavan commented 6 years ago

There are basically 2 main approaches that people use for this. The first is come up with features like Diego said - usually when you do this you get features from the images and from the audio; the 2nd is to project both the image and the audio signal into a new subspace and do the fusion there. For example this can be done with PCA - although that might not be a good representation of the data. When thinking about this, what are important parts of each modality for emotion (e.g. eyes and mouth are important in face, what is important in audio? Here is a link with some information about audio and emotion - http://www.scholarpedia.org/article/Speech_emotion_analysis

david1437 commented 6 years ago

Python Code to get Landmarks

We just need to add this code to the face detector and then store the outputs in a file

import dlib
import numpy as np
from skimage import io

predictor_path = "shape_predictor_68_face_landmarks.dat"

detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(predictor_path)

img = io.imread("FDT.jpg")

dets = detector(img)

#output face landmark points inside retangle
#shape is points datatype
#http://dlib.net/python/#dlib.point
for k, d in enumerate(dets):
    shape = predictor(img, d)

vec = np.empty([68, 2], dtype = int)
for b in range(68):
    vec[b][0] = shape.part(b).x
    vec[b][1] = shape.part(b).y

print(vec)