Closed longyangqi closed 3 weeks ago
Hi, we reconstructed the first 200 consecutive frames per video, and randomly sampled 50 frames out of them for training. The processing procedure did take a long time, and we used multi-process script to run it on CPU and GPU clusters.
Hi, we reconstructed the first 200 consecutive frames per video, and randomly sampled 50 frames out of them for training. The processing procedure did take a long time, and we used multi-process script to run it on CPU and GPU clusters.
Thanks for your reply! As for the 200 frames, did you sample from the original video (e.g. every 5 frames to select 1) or maintain the original fps? In my case, 200 consecutive frames may only contain small head motion and expressions.
Hi, we reconstructed the first 200 consecutive frames per video, and randomly sampled 50 frames out of them for training. The processing procedure did take a long time, and we used multi-process script to run it on CPU and GPU clusters.
Thanks for your reply! As for the 200 frames, did you sample from the original video (e.g. every 5 frames to select 1) or maintain the original fps? In my case, 200 consecutive frames may only contain small head motion and expressions.
We did not subsample the frames but maintained the original framerate.
Great Work! I have some questions about the data preprocess of VFHQ. As stated in paper, you sampled 50 frames per clip to train the model. Did you process all the frames of clips like in https://github.com/YuDeng/Portrait-4D?tab=readme-ov-file#data-preprocessing-for-custom-images to get the reconstructed flame parameters and segmentations?
In my understanding, processing all the frames would spend much time, while reconstructing the flame parameters from not consecutive may produce inferior results. I wonder how did you process the data? Thanks!