Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
https://synclabs.so
10.25k stars 2.2k forks source link

[BUG] mismatch when preparing training audio and frames #437

Open xjw00654 opened 1 year ago

xjw00654 commented 1 year ago

Hi @prajwalkr,

After checking the code released. I have an issue that: In wav2lip_train.py:L92:

def get_segmented_mels(self, spec, start_frame):
    ............
    for i in range(start_frame_num, start_frame_num + syncnet_T):
        m = self.crop_audio_window(spec, i - 2)

We got the 5 audio segmentations from current frame_num - 2 to frame_num + 2.

But the image frame data loading in wav2lip_train.py#L47-L57

    def get_window(self, start_frame):
        start_id = self.get_frame_id(start_frame)
        vidname = dirname(start_frame)

        window_fnames = []
        for frame_id in range(start_id, start_id + syncnet_T):
            frame = join(vidname, '{}.jpg'.format(frame_id))
            if not isfile(frame):
                return None
            window_fnames.append(frame)
        return window_fnames

We see the image frame is collecting from current current frame_num to current frame_num + 5.

It is a kind of mismatching, right? Or my misunderstanding?

Thanks again for your excellent work.

BR.

hnsywangxin commented 2 months ago

same question