About data preprocess of VFHQ

YuDeng / Portrait-4D

Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data (CVPR 24); Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer (ECCV 2024)

MIT License

241 stars 10 forks source link

About data preprocess of VFHQ #13

Closed longyangqi closed 3 weeks ago

longyangqi commented 4 weeks ago

Great Work! I have some questions about the data preprocess of VFHQ. As stated in paper, you sampled 50 frames per clip to train the model. Did you process all the frames of clips like in https://github.com/YuDeng/Portrait-4D?tab=readme-ov-file#data-preprocessing-for-custom-images to get the reconstructed flame parameters and segmentations?

In my understanding, processing all the frames would spend much time, while reconstructing the flame parameters from not consecutive may produce inferior results. I wonder how did you process the data? Thanks!

YuDeng commented 3 weeks ago

Hi, we reconstructed the first 200 consecutive frames per video, and randomly sampled 50 frames out of them for training. The processing procedure did take a long time, and we used multi-process script to run it on CPU and GPU clusters.

longyangqi commented 3 weeks ago

Hi, we reconstructed the first 200 consecutive frames per video, and randomly sampled 50 frames out of them for training. The processing procedure did take a long time, and we used multi-process script to run it on CPU and GPU clusters.

Thanks for your reply! As for the 200 frames, did you sample from the original video (e.g. every 5 frames to select 1) or maintain the original fps? In my case, 200 consecutive frames may only contain small head motion and expressions.

YuDeng commented 3 weeks ago

Hi, we reconstructed the first 200 consecutive frames per video, and randomly sampled 50 frames out of them for training. The processing procedure did take a long time, and we used multi-process script to run it on CPU and GPU clusters.

Thanks for your reply! As for the 200 frames, did you sample from the original video (e.g. every 5 frames to select 1) or maintain the original fps? In my case, 200 consecutive frames may only contain small head motion and expressions.

We did not subsample the frames but maintained the original framerate.