Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
https://synclabs.so
10.85k stars 2.3k forks source link

Question abount get_segmented_mels #680

Open ainrichman opened 6 months ago

ainrichman commented 6 months ago
def get_segmented_mels(self, spec, start_frame):
    mels = []
    assert syncnet_T == 5
    start_frame_num = self.get_frame_id(start_frame) + 1 # 0-indexing ---> 1-indexing
    if start_frame_num - 2 < 0: return None
    for i in range(start_frame_num, start_frame_num + syncnet_T):
        m = self.crop_audio_window(spec, i - 2)
        if m.shape[0] != syncnet_mel_step_size:
            return None
        mels.append(m.T)
    mels = np.asarray(mels)
    return mels

Why do you guys use mels surrounding the center frame as condition guidence to generate sync lip? It takes 5 windows with size of 16. At inference stage, you just use the center one of the 5 windows. Don't you consider there is inconsistency between training and inference? Why not use the exact mel window (start_frame + 16) corresponding to the target at training stage?