dvlab-research / Video-P2P

Video-P2P: Video Editing with Cross-attention Control
https://video-p2p.github.io/
355 stars 24 forks source link

what is the VRAM requirement? #1

Closed ninjasaid2k closed 1 year ago

ninjasaid2k commented 1 year ago

what is the VRAM requirement on a 3090 or other consumer GPUs?

ShaoTengLiu commented 1 year ago

At least 20G VRAM is required now.

The P2P process (the 2nd stage) can only support fp32 at this stage. We will try to develop an fp16 version. Welcome any PRs about it.

shliu0 commented 1 year ago

what is the VRAM required for the nulltext version videop2p in fp32?It seems require more than 20G, cause I could not run it on a 24G GPU. Also, I try to set mix_precision=fp16 in p2p.yaml to reduce the need for VRAM, but failed with following error:


  File "Video-P2P/run_videop2p.py", line 664, in <module>
    main(**OmegaConf.load(args.config), fast=args.fast)
  File "Video-P2P/run_videop2p.py", line 619, in main
    (image_gt, image_enc), x_t, uncond_embeddings = null_inversion.invert(image_path, prompt, offsets=(0,0,0,0), verbose=True)
  File "Video-P2P/run_videop2p.py", line 583, in invert
    image_rec, ddim_latents = self.ddim_inversion(image_gt)
  File  "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/utils//_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "Video-P2P/run_videop2p.py", line 540, in ddim_inversion
    ddim_latents = self.ddim_loop(latent)
  File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "Video-P2P/run_videop2p.py", line 527, in ddim_loop
    noise_pred = self.get_noise_pred_single(latent, t, cond_embeddings)
  File "Video-P2P/run_videop2p.py", line 440, in get_noise_pred_single
    noise_pred = self.model.unet(latents, t, encoder_hidden_states=context)["sample"]
  File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "Video-P2P/tuneavideo/models/unet.py", line 359, in forward
    sample = self.conv_in(sample)
  File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "Video-P2P/tuneavideo/models/resnet.py", line 16, in forward
    x = super().forward(x)
  File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same ```