deep-floyd / IF

Other
7.64k stars 497 forks source link

Question: Custom Resolution or Aspect Ratios #82

Open Gitterman69 opened 1 year ago

Gitterman69 commented 1 year ago

I love the new model but certainly miss custom resolution and aspect ratios…. Any way to do it yet????

tildebyte commented 1 year ago

This isn't available when using 🤗Diffusers pipelines; you have to run as indicated in Run The Code Locally. Notice the args to dream() in the "I. Dream" section - you can add e.g. aspect_ratio='3:2' there.

IIUC, there's no way to specify custom resolutions per se - the resolutions are hard-coded(maybe?), but the aspect ratio can be varied.

Gitterman69 commented 1 year ago

i tried to generate custom aspect ratio with my code below.... stage 3 gets me OOM eventhough im using a 3090.... maybe you guys can try to run it as well? it would be great to find out how exactly to get the custom Aspect Ratio running

from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder
from deepfloyd_if.pipelines import dream

device = 'cuda:0'

print("Starting IF Stage I...")
if_I = IFStageI('IF-I-XL-v1.0', device=device)
print("IF Stage I completed.")

print("Starting IF Stage II...")
if_II = IFStageII('IF-II-L-v1.0', device=device)
print("IF Stage II completed.")

print("Starting Stable Stage III...")
if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
print("Stable Stage III completed.")

print("Initializing T5 Embedder...")
t5 = T5Embedder(device="cpu")
print("T5 Embedder initialized.")

prompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods'
count = 1

print("Starting dream pipeline...")
result = dream(
    t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
    prompt=[prompt]*count,
    seed=42,
    if_I_kwargs={
        "guidance_scale": 7.0,
        "sample_timestep_respacing": "smart100",
        "aspect_ratio": "3:2",
    },
    if_II_kwargs={
        "guidance_scale": 4.0,
        "sample_timestep_respacing": "smart50",
        #"aspect_ratio": "3:2",
    },
    if_III_kwargs={
        "guidance_scale": 9.0,
        "noise_level": 20,
        "sample_timestep_respacing": "75",
        #"aspect_ratio": "3:2",
    },
)
print("Dream pipeline completed.")

if_III.show(result['III'], size=14)
tildebyte commented 1 year ago

66 claims that it's possible to run inference with IF using only 6G VRAM, but I have not tested it myself

waffletower commented 1 year ago

I haven't tried the dream pipeline yet. I have been able to provide a width argument to a stage I pipeline (using both the base DiffusionPipeline and the IFPipeline classes) successfully, but haven't succeeded for subsequent stage pipelines.

waffletower commented 1 year ago

i tried to generate custom aspect ratio with my code below.... stage 3 gets me OOM eventhough im using a 3090.... maybe you guys can try to run it as well? it would be great to find out how exactly to get the custom Aspect Ratio running

I also have a 3090 and have been testing with an identical model configuration, albeit using the DiffusionPipeline APIs. With the default resolution, the maximum GPU memory utilized during processing is over 20gb, so I would expect it very possible that an aspect ratio change (particularly 3:2) would put it over the 24gb available to the 3090. You can try 4:3, 5:4, etc. and see if that squeaks by.

waffletower commented 1 year ago

66 claims that it's possible to run inference with IF using only 6G VRAM, but I have not tested it myself

You might be able to perform 2-stage pipelines on 6gb using the IF-I-M and IF-II-M models. The poster is using IF-I-XL and IF-II-L and a third scaling stage instead. They could certainly try again with smaller models.

waffletower commented 1 year ago

Good news -- I was able to render a 1536x1024 image via Diffusers and the IF-I-XL -> IF-II-L -> 4x scaler pipeline configuration. This was done on an RTX 3090 -- memory usage just squeaked by at 23031MiB during the final scaling phase. I needed to make the following simple change to diffusers:

https://github.com/waffletower/diffusers/commit/035b010fa9e696ad35ccd54b2576571fefed39b8

and provide correct dimension values (width in my case) for the first two stages:

width=96
width=384

respectively.

The pipeline configuration:

import sys
from diffusers import DiffusionPipeline, IFPipeline, IFSuperResolutionPipeline
from diffusers.utils import pt_to_pil
import torch
import numpy as np

# stage 1
stage_1 = IFPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_model_cpu_offload()

# stage 2
stage_2 = IFSuperResolutionPipeline.from_pretrained("DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16",
                                            torch_dtype=torch.float16)
stage_2.enable_model_cpu_offload()

# stage 3
safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "wate\
rmarker": stage_1.watermarker}

stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dt\
ype=torch.float16)
stage_3.enable_model_cpu_offload()

And the invocation:

prompt = 'Jennifer Aniston throwing her shoe at Tucker Carlson'

# text embeds
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

base_seed = np.random.randint(0, sys.maxsize)
for x in range(1):
    generator = torch.manual_seed(base_seed + x)

    image = stage_1(prompt_embeds=prompt_embeds,
                    negative_prompt_embeds=negative_embeds,
                    generator=generator,
                    output_type="pt",
                    width=96).images
    pt_to_pil(image)[0].save("./if_stage_I.png")

    image = stage_2(image=image,
                    prompt_embeds=prompt_embeds,
                    negative_prompt_embeds=negative_embeds,
                    generator=generator,
                    output_type="pt",
                    width=384).images
    pt_to_pil(image)[0].save("./if_stage_II.png")

    image = stage_3(prompt=prompt,
                    image=image,
                    generator=generator,
                    noise_level=100).images
    image[0].save(f"{base_seed + x}.png")

I think an aspect ratio argument is preferable, but that can be easily built in the calling code, and can coordinate the differences between the pipeline stages.

Bigfield77 commented 1 year ago

Hello, I was able to replicate the change in aspect ratio for stage_1 but stage_2 complains about and unknown argument width

IFSuperResolutionPipeline.call() got an unexpected keyword argument 'width'

I have deepfloyd if 1.0.1

If I don't specify with for stage_2, I get an image with the correct aspect ratio for stage 1 but stage 2 squishes everything in a 256*256 image

Edit My bad, I just noticed the link to the modification in src/diffusers/pipelines/deepfloyd_if/pipeline_if_superresolution.py

will give this a try!

Edit 2 Works fine! :)

cheers!

Gitterman69 commented 1 year ago

what custom resolutions are currently supported? 1920x1024 works whereas 1920x1080 doesnt... super strange? any ideas???

Bigfield77 commented 1 year ago

I only do the first 2 stages as the SD upscaler doesn't work on my install right now.

For the first stage anything in the range 8080 pixels and above starts to generate strange images so that would be like 12801280 after 4x * 4x