[Excercise] Building Model

head-iie-vnr commented 4 days ago

Options

We can use a model using API key
We can download the model.

Model is stored in in Memory Database. Is it??? What is the PPT file?

The compute power is from CPU or GPU.

Colab 16GB CPU is 70k INR 16GB GPU is

For quantum computing we need GPU. If we use CPU vs GPU the accuracy of results will be higher with GPU.

Hardware accelerator options: CPU, T4 GPU, A100GPU, L4GPU, TPUv2. Choose T4 GPU.

~/.cache/huggingface

head-iie-vnr commented 4 days ago

PIP installs

pip install accelerate
pip install torch torchvision
pip install diffusers
pip install accelerate

Code


import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image

pipeline = I2VGenXLPipeline.from_pretrained("ali-vilab/i2vgen-xl", torch_dtype=torch.float16, variant="fp16")
pipeline.enable_model_cpu_offload()

image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0009.png"
image = load_image(image_url).convert("RGB")

prompt = "Papers were floating in the air on a table in the library"
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)

frames = pipeline(
    prompt=prompt,
    image=image,
    num_inference_steps=50,
    negative_prompt=negative_prompt,
    guidance_scale=9.0,
    generator=generator
).frames[0]

export_to_gif(frames, "/content/drive/i2v.gif")

from IPython.display import Image
display(Image(filename="/content/drive/i2v.gif"))

Running on the system

On Local System with 6GB GPU

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 440.00 MiB. GPU

On Google Colab

Got Error

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.72 GiB. GPU

Got message The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache().

When I used TPU even with 2 iterations it gave me result.

head-iie-vnr commented 4 days ago

Code: Step-by-Step Explanation

Import Libraries:
```
import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image
```
- torch: The PyTorch library is used for deep learning and tensor computations.
- I2VGenXLPipeline: A class from the diffusers library for the image-to-video generation pipeline.
- export_to_gif and load_image: Utility functions from the diffusers library for exporting frames to a GIF and loading images, respectively.
Load the Pre-trained Model:
```
pipeline = I2VGenXLPipeline.from_pretrained("ali-vilab/i2vgen-xl", torch_dtype=torch.float16, variant="fp16")
pipeline.enable_model_cpu_offload()
```
- from_pretrained("ali-vilab/i2vgen-xl"): Loads the pre-trained image-to-video generation model from Hugging Face's model hub.
- torch_dtype=torch.float16: Specifies that the model should use 16-bit floating-point precision, which reduces memory usage.
- variant="fp16": Indicates that the model variant using half-precision should be loaded.
- enable_model_cpu_offload(): Offloads parts of the model to the CPU to save GPU memory.
Load an Image:
```
image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0009.png"
image = load_image(image_url).convert("RGB")
```
- image_url: URL of the image to be loaded.
- load_image(image_url): Downloads and loads the image from the specified URL.
- .convert("RGB"): Converts the image to RGB format.

Define Prompts:

prompt = "Papers were floating in the air on a table in the library"
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"

prompt: The text prompt describing the desired scene to generate in the video.
negative_prompt: A text prompt specifying what should be avoided in the generated video.

Set the Random Seed:
```
generator = torch.manual_seed(8888)
```
- torch.manual_seed(8888): Sets the seed for random number generation to ensure reproducibility.
Generate Video Frames:
```
frames = pipeline(
   prompt=prompt,
   image=image,
   num_inference_steps=50,
   negative_prompt=negative_prompt,
   guidance_scale=9.0,
   generator=generator
).frames[0]
```
- pipeline(...): Runs the image-to-video generation pipeline with the specified parameters.
  - prompt: The text prompt for the desired scene.
  - image: The input image used as a reference for video generation.
  - num_inference_steps=50: The number of inference steps for generating the video frames.
  - negative_prompt: The negative prompt to avoid certain undesired features.
  - guidance_scale=9.0: A scaling factor that controls the influence of the text prompt on the generated video.
  - generator: The random number generator for reproducibility.
- .frames[0]: Retrieves the first set of frames from the generated video.
Export Frames to GIF:
```
export_to_gif(frames, "/content/drive/i2v.gif")
```
- export_to_gif(frames, "/content/drive/i2v.gif"): Converts the generated frames to a GIF file and saves it to the specified path.
Display the GIF:
```
from IPython.display import Image
display(Image(filename="/content/drive/i2v.gif"))
```
- from IPython.display import Image: Imports the Image class from IPython for displaying images.
- display(Image(filename="/content/drive/i2v.gif")): Displays the generated GIF in the notebook.

Summary

This code loads a pre-trained image-to-video generation model, sets up a text prompt and negative prompt, generates a video based on the input image and prompts, exports the video frames to a GIF, and then displays the GIF. The use of mixed precision and CPU offloading helps manage GPU memory usage.

head-iie-vnr commented 4 days ago

Deep Dive on few critical parameters used above.

1. `num_inference_steps=50`

Purpose: This parameter defines the number of steps the model takes during the inference (generation) process.
Detail: In generative models, particularly in diffusion models or other iterative generative processes, the number of inference steps can impact the quality and detail of the generated output. More steps can lead to higher quality outputs as the model has more opportunities to refine its predictions, but it also increases computational time.
Implication: Setting num_inference_steps to 50 means the model will iterate 50 times over the generation process, progressively refining the video frames. Increasing this number may improve the output quality but will require more computation and time. Decreasing it might speed up the generation but can lead to less detailed or lower-quality results.

2. `negative_prompt`

Purpose: This is a textual input that specifies what should be avoided in the generated video.
Detail: While the main prompt guides the model towards the desired content, the negative_prompt acts as a filter to steer the model away from generating certain features or artifacts. It can include descriptions of visual elements, styles, or other characteristics that are undesirable in the output.
Implication: The negative_prompt helps in refining the generated content by providing a counterbalance to the main prompt. For example, if you want a clear and aesthetic video but you notice the model often generates blurry or distorted images, you can include terms like "blurry" or "distorted" in the negative prompt to reduce their occurrence.

3. `guidance_scale=9.0`

Purpose: This parameter controls the strength of the guidance from the text prompt.
Detail: The guidance scale adjusts how strongly the model follows the text prompt versus how much it relies on the initial random noise or other inputs. A higher guidance scale means the model places more emphasis on adhering to the text prompt.
Implication: Setting guidance_scale to 9.0 means that the text prompt heavily influences the generated video. If the scale is too high, the model might overfit to the prompt and ignore other important aspects of the generation process, potentially leading to unnatural results. Conversely, if the scale is too low, the generated content might not align well with the prompt. Adjusting this scale helps balance the influence of the prompt to achieve the desired output quality and relevance.

4. `generator`

Purpose: This parameter sets the seed for random number generation to ensure reproducibility.
Detail: The generator is used to control the randomness in the generation process. By setting a specific seed with torch.manual_seed(8888), you ensure that every time the code runs with the same seed, it produces the same sequence of random numbers, leading to identical outputs.
Implication: Using a fixed seed is crucial for reproducibility, especially in experiments and comparisons. It allows you to obtain consistent results across different runs of the same code. If the seed is not set, the outputs can vary between runs due to different random initializations, making it difficult to debug or compare results.

Vignana-Jyothi / kp-gen-ai