ai4r / Gesture-Generation-from-Trimodal-Context

Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity (SIGGRAPH Asia 2020)
Other
245 stars 35 forks source link

Computing the Frechet Gesture Distance #22

Closed UttaranB127 closed 3 years ago

UttaranB127 commented 3 years ago

Hi,

I wrote a code for computing the FGD based on the paper, but the result for the pretrained model is coming out to be ~100. However, according to Table 2 in the paper, the number should be around ~5. I'm wondering if I missed anything in my code:

import numpy as np
import scipy.linalg as sl

def frechet_gesture_distance(features_target, features_predicted):
    '''
    :param features_target: numpy array of shape (N, 32), where N is the number of samples
    :param features_predicted: numpy array of shape (N, 32), where N is the number of samples
    :return: a non-negative number denoting the Frechet Gesture Distance
    '''
    mean_features_target = np.mean(features_target, axis=0)
    cov_features_target = np.cov(np.transpose(features_target - mean_features_target))
    mean_features_predicted = np.mean(features_predicted, axis=0)
    cov_features_predicted = np.cov(np.transpose(features_predicted - mean_features_predicted))
    fgd = np.power(np.linalg.norm(mean_features_target - mean_features_predicted), 2.) +\
        np.trace(cov_features_target + cov_features_predicted -
                 2. * sl.sqrtm(np.matmul(cov_features_target, cov_features_predicted)))
    return fgd

Also, is the FGD implemented somewhere in the codebase so that I can take a look?

youngwoo-yoon commented 3 years ago

Hi, Here is the code for FGD: https://github.com/ai4r/Gesture-Generation-from-Trimodal-Context/blob/master/scripts/model/embedding_space_evaluator.py Could you try this?

UttaranB127 commented 3 years ago

Hi,

Yes, it works, thanks! I realized there are some pre-processing steps involved before computing the FGD.