acvictor / Obama-Lip-Sync

An implementation of ObamaNet: Photo-realistic lip-sync from text.
MIT License
124 stars 33 forks source link

producing TestVideo/kp_test.pickle #1

Closed duyvk closed 5 years ago

duyvk commented 5 years ago

Hi acvictor, Could u plz tell me how to produce TestVideo/kp_test.pickle in run.py ? Thank ya.

acvictor commented 5 years ago

You just need run the entire process - get the frames, run dlib, get various points, tilt, size - on whatever you and write it to a pickle file.

kduy commented 5 years ago

thanks @acvictor . By the way, when we normalize data, during pca process, the upsample factor is factor = int(np.ceil(100/25)). But when we run run.py, could u plz explain why factor now is 100/30 at subsample(y_pred, 100, 30) ?

acvictor commented 5 years ago

In the beginning, I sample the video at 25 frames per second using ffmpeg so that it's a perfect upsample to 100 fps (audio is sampled at 100 samples per second). To generate the final video after running the LSTM network I'm subsampling to 30 to generate a video at 30 fps. You could just as well change this to 25 as well! It might possibly give you better sync with audio.

kduy commented 5 years ago

yeah, it makes sense. do u have any idea how we can produce the quality as good as reported in the paper? I think they do use some 3D reconstruction techniques as well.

acvictor commented 5 years ago

In Supasorn Suwajanakorn's they do use 3D reconstruction and their way of synthesizing texture is very different. For ObamaNet they probably train on way more data! For me, the output of the LSTM network got weird with more training data. I'm still not sure why though. You could try train Pix2Pix on more data!