训练预处理时，保存的人脸图片比原始帧数要少

TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Other

2.4k stars 292 forks source link

训练预处理时，保存的人脸图片比原始帧数要少 #178

Open miumiuc opened 3 weeks ago

miumiuc commented 3 weeks ago

请问一下训练前进行数据预处理时，保存在images中人脸帧数比视频的原始帧数要少很多，是这里的代码会覆盖图片嘛，还是特地这样设计的？这样音频特征与图片是不是对应不上了 for i, (whisper_batch,crop_batch) in enumerate(tqdm(gen,total=int(np.ceil(float(video_num)/batch_size)))): crop_index=0 for image,audio in zip(crop_batch,whisper_batch): cv2.imwrite(f"data/images/{folder_name}/{str(i+crop_index+total_image_index+1)}.png",image) crop_index+=1 temp_image_index=i+crop_index+total_image_index+1

Embracex1998 commented 2 weeks ago

我也遇到了可能是总数没有对齐batch_size 导致最后一部分变少吧我还没看代码不知道有没有丢弃那部分

miumiuc commented 2 weeks ago

是这个索引的问题：str(i+crop_index+total_image_index+1)}，会出现重复的索引，比如第一个batch索引是0，1，2，3，第二个batch索引是1，2，3，4，这样1，2，3就被覆盖掉了，关于训练还有一些问题，可以加个联系方式吗

zhangyuzyy commented 5 days ago

我们也遇到同样的问题，这样会导致人脸数量的音频特征对不上，不知道是bug还是场景这么做的

ShowLo commented 4 hours ago

Fix it by： crop_index=0 for i, (whisper_batch,crop_batch) in enumerate(tqdm(gen,total=int(np.ceil(float(video_num)/batch_size)))): for image,audio in zip(crop_batch,whisper_batch): cv2.imwrite(f"data/images/{folder_name}/{str+crop_index+total_image_index+1)}.png",image) crop_index+=1 temp_image_index=crop_index+total_image_index+1