NVlabs / few-shot-vid2vid

Pytorch implementation for few-shot photorealistic video-to-video translation.
Other
1.8k stars 275 forks source link

Rough estimation of training time required before training #19

Closed AndroYD84 closed 4 years ago

AndroYD84 commented 4 years ago

Hello, in your paper you mentioned to have used a NVIDIA DGX-1 machine with 8 32GB V100 GPUs and 15,000 clips for training the pose model, however, could you please be more specific about the amount of pictures, resolution and time it took to train your model? If I have something I can compare to, I can at least roughly estimate if it's a suicide mission to train something comparable/similar on a way less powerful machine, I have a Titan RTX with 24GB Vram and just to prepare my data (643.630 FullHD pictures) with OpenPose w/ maximum precision configuration and DensePose is likely going to take 5/7 days, just for that. Also, did your data require any cleaning/post-processing as occasional errors will always happen with data collected in the wild (such as incomplete hands/face, wrong bone positioning, glitches, etcetera)? Or as long as the data is reasonably good (not outright perfect) the error margin is big enough that it can be left ignored? Thanks for your amazing work!

lukemelas commented 4 years ago

If you could provide this, it would be incredibly helpful. The information in the paper is not sufficient to fully reproduce the experiments. Thank you for your work!

tcwang0509 commented 4 years ago

For the pose dataset, it takes approximately a week using the default setting.

lukemelas commented 4 years ago

Thanks for the response! To confirm, this is a week using the default settings with 8 V100s or with a single V100?

tcwang0509 commented 4 years ago

Using 8 GPUs.