ninolyl commented 6 years ago

hi, i run the sample you offered, and get approriate result;

while when i want to generate the bin file of mfcc feature myself, i got the wrong result even with the same wav you use 0572_0019_0003.wav;

my python code to generate mfcc feature(try to get 25fps result) like follows:

import numpy as np import sys import python_speech_features from scipy import signal from scipy.io import wavfile import subprocess base_dir = sys.argv[2]

audiotmp = os.path.join(opt.tmp_dir,'audio.wav')

audiotmp = 'tmp.wav' videofile = sys.argv[1] command = ("ffmpeg -y -i %s -async 1 -ac 1 -vn -acodec pcm_s16le -ar 16000 %s" % (videofile,audiotmp)) output = subprocess.call(command, shell=True, stdout=None) sample_rate, audio = wavfile.read(audiotmp) mfcc = zip(*python_speech_features.mfcc(audio,sample_rate)) mfcc = np.stack([np.array(i) for i in mfcc]) mfcc = np.transpose(mfcc[1:], (1,0)) lenn = mfcc.shape[0]//4

for i in np.arange(lenn-6): tmp_data = mfcc[i4:i4+20].reshape(240) tmp_data.tofile('%s/%d.bin'%(base_dir, i))

print('over.....')

the result are like test_sample1_fake_audio_b_0_15

the python is 3.6, would you help me to find where is the bug?

ninolyl commented 6 years ago

plus, I found in network/mfcc_networks.py self.bn1 = nn.BatchNorm1d(64), I modify it to self.bn1 = nn.BatchNorm2d(64)

Hangz-nju-cuhk commented 6 years ago

This inconsistency of BatchNorm seems to be my mistake. However, once you change the layer, there will be no pre-trained model for it.

Besides, different tools may render different mfcc feature values. The one we used is provided by SyncNet, which is written in Readme.

ninolyl commented 6 years ago

the mfcc feature generated I use is the python version of SyncNet; if i don't modify the batchnorm, the code can't run successfully; Is it the difference of python?

Hangz-nju-cuhk commented 6 years ago

Are there any errors if the batchnorm is not modified?

smolsnastya commented 6 years ago

Hi, could you please share a piece of code for calculating bins? We tried both python and matlab implementations with your params but did not get the same bins as yours.

smolsnastya commented 6 years ago

@Hangz-nju-cuhk Sorry, Could you help with mfcc features for getting bins, please?

Hangz-nju-cuhk commented 6 years ago

Sorry for the delay, I migrated all of my codes to another computer the past few weeks and could not find the Matlab code. So I just wrote a simple function based on the SyncNet provided Matlab codes and updated it in the preprocessing folder.

smolsnastya commented 6 years ago

@Hangz-nju-cuhk Thank you very much!

natravedrova commented 5 years ago

@smolsnastya I used the following python code for calculating bins from a wav file: https://github.com/natravedrova/Talking-Face-Generation-DAVS/blob/master/preprocess/savemfcc.py

It reproduces steps from respective MATLAB code. Please give it a try.

ayushchopra96 commented 4 years ago

@natravedrova are you able to obtained similar results (to .mat code) with bin files generated using your implementation?

natravedrova commented 4 years ago

@natravedrova are you able to obtained similar results (to .mat code) with bin files generated using your implementation?

Yes, I am.

I tried different wav files and a photo to successfully generate talking heads. I did not have artifacts reported here.

ayushchopra96 commented 4 years ago

@natravedrova are you able to obtained similar results (to .mat code) with bin files generated using your implementation?

Yes, I am.

I tried different wav files and a photo to successfully generate talking heads. I did not have artifacts reported here.

Thanks!! It worked for me as well :)

mph1900 commented 4 years ago

https://github.com/natravedrova/Talking-Face-Generation-DAVS/blob/master/preprocess/savemfcc.py link is broken do you have it?

natravedrova commented 4 years ago

https://github.com/natravedrova/Talking-Face-Generation-DAVS/blob/master/preprocess/savemfcc.py link is broken do you have it?

@mph1900 , sorry for the confusion. The file got lost for a reason. Maybe this is because my copy of the repo is in sync with the current one. Nevertheless, I've found this file on my computer. Please take a look: https://gist.github.com/natravedrova/52379259ddc17dfba5f68778f480c704

mph1900 commented 4 years ago

amazing. i'll give it a try

Hangz-nju-cuhk / Talking-Face-Generation-DAVS

Use the sample wav and pretrained model, get weird result #5

audiotmp = os.path.join(opt.tmp_dir,'audio.wav')