Open head-iie-vnr opened 4 days ago
pip install accelerate
pip install torch torchvision
pip install diffusers
pip install accelerate
import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image
pipeline = I2VGenXLPipeline.from_pretrained("ali-vilab/i2vgen-xl", torch_dtype=torch.float16, variant="fp16")
pipeline.enable_model_cpu_offload()
image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0009.png"
image = load_image(image_url).convert("RGB")
prompt = "Papers were floating in the air on a table in the library"
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)
frames = pipeline(
prompt=prompt,
image=image,
num_inference_steps=50,
negative_prompt=negative_prompt,
guidance_scale=9.0,
generator=generator
).frames[0]
export_to_gif(frames, "/content/drive/i2v.gif")
from IPython.display import Image
display(Image(filename="/content/drive/i2v.gif"))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 440.00 MiB. GPU
Got Error
OutOfMemoryError: CUDA out of memory. Tried to allocate 1.72 GiB. GPU
Got message
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache()
.
When I used TPU even with 2 iterations it gave me result.
Import Libraries:
import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image
torch
: The PyTorch library is used for deep learning and tensor computations.I2VGenXLPipeline
: A class from the diffusers
library for the image-to-video generation pipeline.export_to_gif
and load_image
: Utility functions from the diffusers
library for exporting frames to a GIF and loading images, respectively.Load the Pre-trained Model:
pipeline = I2VGenXLPipeline.from_pretrained("ali-vilab/i2vgen-xl", torch_dtype=torch.float16, variant="fp16")
pipeline.enable_model_cpu_offload()
from_pretrained("ali-vilab/i2vgen-xl")
: Loads the pre-trained image-to-video generation model from Hugging Face's model hub.torch_dtype=torch.float16
: Specifies that the model should use 16-bit floating-point precision, which reduces memory usage.variant="fp16"
: Indicates that the model variant using half-precision should be loaded.enable_model_cpu_offload()
: Offloads parts of the model to the CPU to save GPU memory.Load an Image:
image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0009.png"
image = load_image(image_url).convert("RGB")
image_url
: URL of the image to be loaded.load_image(image_url)
: Downloads and loads the image from the specified URL..convert("RGB")
: Converts the image to RGB format.Define Prompts:
prompt = "Papers were floating in the air on a table in the library"
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
prompt
: The text prompt describing the desired scene to generate in the video.negative_prompt
: A text prompt specifying what should be avoided in the generated video.Set the Random Seed:
generator = torch.manual_seed(8888)
torch.manual_seed(8888)
: Sets the seed for random number generation to ensure reproducibility.Generate Video Frames:
frames = pipeline(
prompt=prompt,
image=image,
num_inference_steps=50,
negative_prompt=negative_prompt,
guidance_scale=9.0,
generator=generator
).frames[0]
pipeline(...)
: Runs the image-to-video generation pipeline with the specified parameters.
prompt
: The text prompt for the desired scene.image
: The input image used as a reference for video generation.num_inference_steps=50
: The number of inference steps for generating the video frames.negative_prompt
: The negative prompt to avoid certain undesired features.guidance_scale=9.0
: A scaling factor that controls the influence of the text prompt on the generated video.generator
: The random number generator for reproducibility..frames[0]
: Retrieves the first set of frames from the generated video.Export Frames to GIF:
export_to_gif(frames, "/content/drive/i2v.gif")
export_to_gif(frames, "/content/drive/i2v.gif")
: Converts the generated frames to a GIF file and saves it to the specified path.Display the GIF:
from IPython.display import Image
display(Image(filename="/content/drive/i2v.gif"))
from IPython.display import Image
: Imports the Image
class from IPython for displaying images.display(Image(filename="/content/drive/i2v.gif"))
: Displays the generated GIF in the notebook.This code loads a pre-trained image-to-video generation model, sets up a text prompt and negative prompt, generates a video based on the input image and prompts, exports the video frames to a GIF, and then displays the GIF. The use of mixed precision and CPU offloading helps manage GPU memory usage.
Deep Dive on few critical parameters used above.
num_inference_steps=50
num_inference_steps
to 50 means the model will iterate 50 times over the generation process, progressively refining the video frames. Increasing this number may improve the output quality but will require more computation and time. Decreasing it might speed up the generation but can lead to less detailed or lower-quality results.negative_prompt
prompt
guides the model towards the desired content, the negative_prompt
acts as a filter to steer the model away from generating certain features or artifacts. It can include descriptions of visual elements, styles, or other characteristics that are undesirable in the output.negative_prompt
helps in refining the generated content by providing a counterbalance to the main prompt. For example, if you want a clear and aesthetic video but you notice the model often generates blurry or distorted images, you can include terms like "blurry" or "distorted" in the negative prompt to reduce their occurrence.guidance_scale=9.0
guidance_scale
to 9.0 means that the text prompt heavily influences the generated video. If the scale is too high, the model might overfit to the prompt and ignore other important aspects of the generation process, potentially leading to unnatural results. Conversely, if the scale is too low, the generated content might not align well with the prompt. Adjusting this scale helps balance the influence of the prompt to achieve the desired output quality and relevance.generator
generator
is used to control the randomness in the generation process. By setting a specific seed with torch.manual_seed(8888)
, you ensure that every time the code runs with the same seed, it produces the same sequence of random numbers, leading to identical outputs.
Options
Model is stored in in Memory Database. Is it??? What is the PPT file?
The compute power is from CPU or GPU.
Colab 16GB CPU is 70k INR 16GB GPU is
For quantum computing we need GPU. If we use CPU vs GPU the accuracy of results will be higher with GPU.
Hardware accelerator options: CPU, T4 GPU, A100GPU, L4GPU, TPUv2. Choose T4 GPU.
~/.cache/huggingface