NVlabs / few-shot-vid2vid

Pytorch implementation for few-shot photorealistic video-to-video translation.
Other
1.8k stars 275 forks source link

GPU requirements? #14

Open JohnHammell opened 4 years ago

JohnHammell commented 4 years ago

What sort of GPU would be required to run few-shot-vid2vid? Would a Geforce 1050 or 1080 be sufficient?

mpottinger commented 4 years ago

I am not 100% certain because I have not tried training a model yet, but I am pretty sure the requirements are similar to the original vid2vid, which means pretty hefty. A consumer graphics card may not suffice, you need a lot of VRAM, 15GB or so, and at least two GPUs, with possibility to support training on one GPU but not recommended.

I never trained a model for the original vid2vid because I determined it would be too expensive.

JohnHammell commented 4 years ago

Thanks a lot for the information. I looked over the vid2vid details a bit more closely and you're right that it does have some quite hefty GPU requirements. As few-shot-vid2vid is also based on pix2pixHD, I quickly looked over their page and they do require a minimum of 11GB of video RAM, which might make the Geforce 1050 to 1080 Ti GPUs not possible for this repo project (but still unsure if those GPU requirements also apply here exactly).

If anyone reading this has already trained few-shot-vid2vid, please mention how many GPUs you used and which model GPU it was. Thanks in advance for any additional info regarding this.

AaronWong commented 4 years ago

you can try

  1. set --batchSize 1
  2. add --debug and change debug options in ./options/base_options.py But if you want a good model result, this requires a long long long training time.
JohnHammell commented 4 years ago

Hi Aaron, thanks for the info.

By 'long long long' would that be maybe 2 weeks of training with a Geforce 1080? Or more like 2 months?

k4rth33k commented 4 years ago

I'm training on a GTX 1060 and it takes 2 hours per epoch, so to get anywhere near those results I guess I need 400 hours of training. It might be a little less if you are using GTX 1080.

pythagoras000 commented 4 years ago

@k4rth33k how can I find GTX 1060 pricing on AWS? Would like to get an estimate pricing for the training to achieve similar results as on the paper.

k4rth33k commented 4 years ago

GTX 1060 is a consumer-grade card. Mostly used for content creation and games. If you are willing to go with AWS the options you have (as far as my knowledge goes) are instances with K80, M60 or V100 cards which are more efficient. You can find the details in this link (https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html). A very vague and rough estimate is that it will cost you around $300 - 350 if you are using a p3.8xlarge instance. I may be wrong about the estimate. Edit: The estimate is for training on the pose data that comes with the repo.

pythagoras000 commented 4 years ago

Thanks @k4rth33k can you please confirm if the data that comes with the repo (both for poses but also face and street) is enough to replicate the same results? I thought the data included in the repo was just for demo purposes and was not complete.

For example, here they mention the size of the FaceForensics dataset, can you please confirm what of the sizes we should consider for training (38.5, 10GB, 2TB)?

AaronWong commented 4 years ago

Hi JohnHammell & k4rth33k The training has 3 parts:

  1. niter_single: # of iter for single frame training
  2. niter: # of iter at starting learning rate
  3. niter_decay: # of iter to linearly decay learning rate to zero The part 1 (training the few shot) is fast, like > training on a GTX 1060 and it takes 2 hours per epoch. But the part 2 & part 3 ( vid2vid traing ), it takes much more time, which depends on your scripts(niter_step, n_frames_total, max_dataset_size...). Part 2 takes 1.808s per step on two V100.(1.808s * 10000 steps / 3600, 5 hours per epoch)
ndyashas commented 4 years ago

@AaronWong Thank you for the details !. Could you please share the model that you have trained?

AaronWong commented 4 years ago

hi, @yashasbharadwaj111 I'm sorry. I can't share our model because our data set is divided into two parts:

  1. Collected from youtube, we have not obtained the consent of the host
  2. I don’t have the right to share the video data of our lab
danny-wu commented 4 years ago

Has anyone been able to get the model training with 6GB of VRAM? I understand performance would suffer, but that is the card I have.