Closed 15458wew closed 3 years ago
Thanks for your reply, if I find a high-definition video data set, I will share it with you. If I use avspeech for training, do I need to improve on the current code? In addition, I noticed that the lrs2 data set has a lot of videos of side faces. s3fd only extracts the approximate range of the face, and does not convert the side face to the front face. Will the side-face video affect the network?
Hey there! Any success with the AVSpeech dataset or any other HD dataset? I'm trying to adapt AVSpeech but maybe there is a better way.
1. AVSpeech videos should be aligned again using the codes present in [this](https://github.com/joonson/syncnet_python) repo. 2. Please let us know if you find a higher resolution dataset than AVSpeech. The AVSpeech videos are not collected specifically for lip-sync. So maybe some dataset that specifically collects data from multiple speakers in 4k will be useful for us. 3. The AVSpeech contains coordinates of the active speaker. We use that followed by another set of face detection to minimize the amount of background. Our code only takes a single face at a time and corresponding audio. Our architecture can accept one frame(with a face) at a time.
@Rudrabha Could you please explain what alignment you mean ? Could be more specific what part of the code in the repo you tagged does the alignment ? Thanks
@Rudrabha Is this the alignment problem ? found this avspeech downloader repo https://github.com/changil/avspeech-downloader
Known issue
FFmpeg uses keyframe seeking when stream copying, which happens with faster=2. When a cut does not start from a keyframe, which happens most of the time, it cuts the video at the closest preceding keyframe and sets a negative start time to compensate for it. Thus, any subsequent tools that take the cut video clips as input should take the start time into account. Most video players do, but if you programatically process video clips, chances are you need to do it yourself and discard the first part of both audio and video streams accordingly.
Hello, thank you for opening up such a good code. I am trying to modify the code and experiment with avspeech to generate more high-definition videos. I have encountered a few problems and want to communicate with you.
Thank you