deep-floyd / IF

Other
7.63k stars 495 forks source link

Cannot seem to get style transfer to work super well (though the python is working) -- suggestions? #103

Open klei22 opened 1 year ago

klei22 commented 1 year ago

Can't seem to get the same great results as on the README.md.

Would there be a better strategy or higher number of param models I should try out?

Currently I'm resizing images to 768 by 512 (which for the most part appears to work), however seems that the distance from the original image is pretty far (I've gotten some good images in the "cardboard" transfer).

Wondering if there's a hyperparameter I could modify (like the temperature or repeat-penalty for llm models), which might decrease the distance of the img2img transferred image from the original?

(Or would there be prompts/styles perhaps which others have found the model to map with higher precision?)

Also sharing python script below which has been working great for me so far (far from perfect), and might be useful to others as a starting point.

Keep in mind that images work but are currently pre-cropped into 768x512 RGB pngs:

import argparse
from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
from PIL import Image
import torch
from huggingface_hub import login
from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder
from deepfloyd_if.pipelines import dream
from deepfloyd_if.pipelines import style_transfer

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch
from huggingface_hub import login

hf_token = 'token'

def parse_args():
    p = argparse.ArgumentParser(
            description='img2img transfer on input',
            formatter_class=argparse.RawDescriptionHelpFormatter
    )

    p.add_argument(
        "-i",
        "--input-file",
        type=str,
        required=True,
        help="input file for img2img",
    )

    return p.parse_args()

def main():
    args = parse_args()
    device = 'cuda:0'

    if_I = IFStageI('IF-I-XL-v1.0', device=device, hf_token=hf_token)
    if_II = IFStageII('IF-II-L-v1.0', device=device, hf_token=hf_token)
    if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)

    t5 = T5Embedder(device="cpu")

    result = style_transfer(
        t5=t5, if_I=if_I, if_II=if_II, if_III =if_III,
        support_pil_img=Image.open(args.input_file).convert('RGB'),
        style_prompt=[
            'in the style of picasso',
            'in the style of origami',
            'in the style of 3d',
            'in the style of monet',
        ],
        seed=4,
        if_I_kwargs={
            "guidance_scale": 10.0,
            "sample_timestep_respacing": "10,10,10,10,10,10,10,10,0,0",
            'support_noise_less_qsample_steps': 5,
        },
        if_II_kwargs={
            "guidance_scale": 4.0,
            "sample_timestep_respacing": 'smart50',
            "support_noise_less_qsample_steps": 5,
        },
    )
    if_I.show(result['III'], 1, 20)

if __name__ == "__main__":
    main()