[BUG] mismatch when preparing training audio and frames

Hi @prajwalkr,

After checking the code released. I have an issue that: In wav2lip_train.py:L92:

def get_segmented_mels(self, spec, start_frame):
    ............
    for i in range(start_frame_num, start_frame_num + syncnet_T):
        m = self.crop_audio_window(spec, i - 2)

We got the 5 audio segmentations from current frame_num - 2 to frame_num + 2.

But the image frame data loading in wav2lip_train.py#L47-L57

    def get_window(self, start_frame):
        start_id = self.get_frame_id(start_frame)
        vidname = dirname(start_frame)

        window_fnames = []
        for frame_id in range(start_id, start_id + syncnet_T):
            frame = join(vidname, '{}.jpg'.format(frame_id))
            if not isfile(frame):
                return None
            window_fnames.append(frame)
        return window_fnames

We see the image frame is collecting from current current frame_num to current frame_num + 5.

It is a kind of mismatching, right? Or my misunderstanding?

Thanks again for your excellent work.

BR.

Rudrabha / Wav2Lip

[BUG] mismatch when preparing training audio and frames #437