Vignana-Jyothi / kp-gen-ai

MIT License
0 stars 0 forks source link

[Excercise] Building Model #26

Open head-iie-vnr opened 4 days ago

head-iie-vnr commented 4 days ago

Options

Model is stored in in Memory Database. Is it??? What is the PPT file?

The compute power is from CPU or GPU.

Colab 16GB CPU is 70k INR 16GB GPU is

For quantum computing we need GPU. If we use CPU vs GPU the accuracy of results will be higher with GPU.

Hardware accelerator options: CPU, T4 GPU, A100GPU, L4GPU, TPUv2. Choose T4 GPU.

~/.cache/huggingface

head-iie-vnr commented 4 days ago

PIP installs

pip install accelerate
pip install torch torchvision
pip install diffusers
pip install accelerate

Code


import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image

pipeline = I2VGenXLPipeline.from_pretrained("ali-vilab/i2vgen-xl", torch_dtype=torch.float16, variant="fp16")
pipeline.enable_model_cpu_offload()

image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0009.png"
image = load_image(image_url).convert("RGB")

prompt = "Papers were floating in the air on a table in the library"
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)

frames = pipeline(
    prompt=prompt,
    image=image,
    num_inference_steps=50,
    negative_prompt=negative_prompt,
    guidance_scale=9.0,
    generator=generator
).frames[0]

export_to_gif(frames, "/content/drive/i2v.gif")

from IPython.display import Image
display(Image(filename="/content/drive/i2v.gif"))

Running on the system

On Local System with 6GB GPU

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 440.00 MiB. GPU

On Google Colab

Got Error

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.72 GiB. GPU

Got message The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache().

When I used TPU even with 2 iterations it gave me result.

head-iie-vnr commented 4 days ago

Code: Step-by-Step Explanation

  1. Import Libraries:

    import torch
    from diffusers import I2VGenXLPipeline
    from diffusers.utils import export_to_gif, load_image
    • torch: The PyTorch library is used for deep learning and tensor computations.
    • I2VGenXLPipeline: A class from the diffusers library for the image-to-video generation pipeline.
    • export_to_gif and load_image: Utility functions from the diffusers library for exporting frames to a GIF and loading images, respectively.
  2. Load the Pre-trained Model:

    pipeline = I2VGenXLPipeline.from_pretrained("ali-vilab/i2vgen-xl", torch_dtype=torch.float16, variant="fp16")
    pipeline.enable_model_cpu_offload()
    • from_pretrained("ali-vilab/i2vgen-xl"): Loads the pre-trained image-to-video generation model from Hugging Face's model hub.
    • torch_dtype=torch.float16: Specifies that the model should use 16-bit floating-point precision, which reduces memory usage.
    • variant="fp16": Indicates that the model variant using half-precision should be loaded.
    • enable_model_cpu_offload(): Offloads parts of the model to the CPU to save GPU memory.
  3. Load an Image:

    image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0009.png"
    image = load_image(image_url).convert("RGB")
    • image_url: URL of the image to be loaded.
    • load_image(image_url): Downloads and loads the image from the specified URL.
    • .convert("RGB"): Converts the image to RGB format.
  4. Define Prompts:

    prompt = "Papers were floating in the air on a table in the library"
    negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
    • prompt: The text prompt describing the desired scene to generate in the video.
    • negative_prompt: A text prompt specifying what should be avoided in the generated video.
  5. Set the Random Seed:

    generator = torch.manual_seed(8888)
    • torch.manual_seed(8888): Sets the seed for random number generation to ensure reproducibility.
  6. Generate Video Frames:

    frames = pipeline(
       prompt=prompt,
       image=image,
       num_inference_steps=50,
       negative_prompt=negative_prompt,
       guidance_scale=9.0,
       generator=generator
    ).frames[0]
    • pipeline(...): Runs the image-to-video generation pipeline with the specified parameters.
      • prompt: The text prompt for the desired scene.
      • image: The input image used as a reference for video generation.
      • num_inference_steps=50: The number of inference steps for generating the video frames.
      • negative_prompt: The negative prompt to avoid certain undesired features.
      • guidance_scale=9.0: A scaling factor that controls the influence of the text prompt on the generated video.
      • generator: The random number generator for reproducibility.
    • .frames[0]: Retrieves the first set of frames from the generated video.
  7. Export Frames to GIF:

    export_to_gif(frames, "/content/drive/i2v.gif")
    • export_to_gif(frames, "/content/drive/i2v.gif"): Converts the generated frames to a GIF file and saves it to the specified path.
  8. Display the GIF:

    from IPython.display import Image
    display(Image(filename="/content/drive/i2v.gif"))
    • from IPython.display import Image: Imports the Image class from IPython for displaying images.
    • display(Image(filename="/content/drive/i2v.gif")): Displays the generated GIF in the notebook.

Summary

This code loads a pre-trained image-to-video generation model, sets up a text prompt and negative prompt, generates a video based on the input image and prompts, exports the video frames to a GIF, and then displays the GIF. The use of mixed precision and CPU offloading helps manage GPU memory usage.

head-iie-vnr commented 4 days ago

Deep Dive on few critical parameters used above.

1. num_inference_steps=50

2. negative_prompt

3. guidance_scale=9.0

4. generator