the MFCC feature size - Githubissues

Rudrabha / LipGAN

This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

http://cvit.iiit.ac.in/research/projects/cvit-projects/facetoface-translation

MIT License

578 stars 122 forks source link

the MFCC feature size #25

Closed WeicongChen closed 4 years ago

WeicongChen commented 4 years ago

Hi, great work! I am a little confused with the MFCC feature size. In the paper, you said

We extract 13 MFCC features from each audio segment (T = 350, F = 100) and discard the first feature similar to Chung et al.

However, in the audio_hparams.py, I found the num_mels equls to 80 but not 13, which is different from the paper's claim. Can you kindly explain the difference for me?

Rudrabha commented 4 years ago

In the fully_pythonic branch, we don't use MFCC features to represent the audio. Instead, we use Mel-spectrograms which has 80 features at each timestep. Please use the master branch if you want to use the exact model used in the paper.