Closed ninjasaid2k closed 1 year ago
At least 20G VRAM is required now.
The P2P process (the 2nd stage) can only support fp32 at this stage. We will try to develop an fp16 version. Welcome any PRs about it.
what is the VRAM required for the nulltext version videop2p in fp32?It seems require more than 20G, cause I could not run it on a 24G GPU. Also, I try to set mix_precision=fp16 in p2p.yaml to reduce the need for VRAM, but failed with following error:
File "Video-P2P/run_videop2p.py", line 664, in <module>
main(**OmegaConf.load(args.config), fast=args.fast)
File "Video-P2P/run_videop2p.py", line 619, in main
(image_gt, image_enc), x_t, uncond_embeddings = null_inversion.invert(image_path, prompt, offsets=(0,0,0,0), verbose=True)
File "Video-P2P/run_videop2p.py", line 583, in invert
image_rec, ddim_latents = self.ddim_inversion(image_gt)
File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/utils//_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "Video-P2P/run_videop2p.py", line 540, in ddim_inversion
ddim_latents = self.ddim_loop(latent)
File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "Video-P2P/run_videop2p.py", line 527, in ddim_loop
noise_pred = self.get_noise_pred_single(latent, t, cond_embeddings)
File "Video-P2P/run_videop2p.py", line 440, in get_noise_pred_single
noise_pred = self.model.unet(latents, t, encoder_hidden_states=context)["sample"]
File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "Video-P2P/tuneavideo/models/unet.py", line 359, in forward
sample = self.conv_in(sample)
File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "Video-P2P/tuneavideo/models/resnet.py", line 16, in forward
x = super().forward(x)
File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "miniconda3/envs/videop2p/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same ```
what is the VRAM requirement on a 3090 or other consumer GPUs?