Open ninolyl opened 6 years ago
plus, I found in network/mfcc_networks.py self.bn1 = nn.BatchNorm1d(64), I modify it to self.bn1 = nn.BatchNorm2d(64)
This inconsistency of BatchNorm seems to be my mistake. However, once you change the layer, there will be no pre-trained model for it.
Besides, different tools may render different mfcc feature values. The one we used is provided by SyncNet, which is written in Readme.
the mfcc feature generated I use is the python version of SyncNet; if i don't modify the batchnorm, the code can't run successfully; Is it the difference of python?
Are there any errors if the batchnorm is not modified?
Hi, could you please share a piece of code for calculating bins? We tried both python and matlab implementations with your params but did not get the same bins as yours.
@Hangz-nju-cuhk Sorry, Could you help with mfcc features for getting bins, please?
Sorry for the delay, I migrated all of my codes to another computer the past few weeks and could not find the Matlab code. So I just wrote a simple function based on the SyncNet provided Matlab codes and updated it in the preprocessing folder.
@Hangz-nju-cuhk Thank you very much!
@smolsnastya I used the following python code for calculating bins from a wav file: https://github.com/natravedrova/Talking-Face-Generation-DAVS/blob/master/preprocess/savemfcc.py
It reproduces steps from respective MATLAB code. Please give it a try.
@natravedrova are you able to obtained similar results (to .mat code) with bin files generated using your implementation?
@natravedrova are you able to obtained similar results (to .mat code) with bin files generated using your implementation?
Yes, I am.
I tried different wav files and a photo to successfully generate talking heads. I did not have artifacts reported here.
@natravedrova are you able to obtained similar results (to .mat code) with bin files generated using your implementation?
Yes, I am.
I tried different wav files and a photo to successfully generate talking heads. I did not have artifacts reported here.
Thanks!! It worked for me as well :)
https://github.com/natravedrova/Talking-Face-Generation-DAVS/blob/master/preprocess/savemfcc.py link is broken do you have it?
https://github.com/natravedrova/Talking-Face-Generation-DAVS/blob/master/preprocess/savemfcc.py link is broken do you have it?
@mph1900 , sorry for the confusion. The file got lost for a reason. Maybe this is because my copy of the repo is in sync with the current one. Nevertheless, I've found this file on my computer. Please take a look: https://gist.github.com/natravedrova/52379259ddc17dfba5f68778f480c704
amazing. i'll give it a try
hi, i run the sample you offered, and get approriate result;
while when i want to generate the bin file of mfcc feature myself, i got the wrong result even with the same wav you use 0572_0019_0003.wav;
my python code to generate mfcc feature(try to get 25fps result) like follows:
import numpy as np import sys import python_speech_features from scipy import signal from scipy.io import wavfile import subprocess base_dir = sys.argv[2]
audiotmp = os.path.join(opt.tmp_dir,'audio.wav')
audiotmp = 'tmp.wav' videofile = sys.argv[1] command = ("ffmpeg -y -i %s -async 1 -ac 1 -vn -acodec pcm_s16le -ar 16000 %s" % (videofile,audiotmp)) output = subprocess.call(command, shell=True, stdout=None) sample_rate, audio = wavfile.read(audiotmp) mfcc = zip(*python_speech_features.mfcc(audio,sample_rate)) mfcc = np.stack([np.array(i) for i in mfcc]) mfcc = np.transpose(mfcc[1:], (1,0)) lenn = mfcc.shape[0]//4
for i in np.arange(lenn-6): tmp_data = mfcc[i4:i4+20].reshape(240) tmp_data.tofile('%s/%d.bin'%(base_dir, i))
print('over.....')
the result are like
the python is 3.6, would you help me to find where is the bug?