chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.06k stars 60 forks source link

Ip_adapter not work with xformers enabled #81

Closed huo-ju closed 6 months ago

huo-ju commented 7 months ago

Hello,

Firstly @chengzeyi Thanks for this project! This the best optimization framework I ever tested by far.

I have a customized pipeline with ip_adapter plus support (by diffusers main branch). The ip_adapter not works with config.enable_xformers = True, and it works well after xformers disabled.

I’m not very sure but I guess there are some conflicts between memory_efficient_attention and ip_adapter’s attnprocessor. Any thoughts?

Ps: It’s still fast enough even to disable the xformers, amazing work!

Thanks!

ldtgodlike commented 7 months ago

You reminded me that Diffusers official said Torch2.0 does not require config.enable_ Xformers=True

huo-ju commented 7 months ago

@ldtgodlike For sure it may not need xformers, because most of attenprocessor from diffusers have 2_0 version which use sdpa. but from stable-fast looks wrapper xformers for others reason. I have a little bit confused here...

ldtgodlike commented 7 months ago

well, when I just try to use diffusers wihtout stable-fast (ip-adapter_sd15+canny_controlnet in StableDiffusionControlNetPipeline) in torch==2.1.0, diffusers==0.24.0,xformers==0.0.22.post7, the result looks wrapper when I use pipe.enable_xformers_memory_efficient_attention(). @huo-ju

fkjkey commented 7 months ago

@huo-ju Hello, may I ask how you use stable-fast to accelerate pipe after .load_ip_adapter. I've also had some problems before while doing this

huo-ju commented 6 months ago

@ldtgodlike it will be changed to 2_0 version if you have torch >=2.0 installed. https://github.com/huggingface/diffusers/blob/f5942649f522f0ae87bcb659735b9fd894349ca8/src/diffusers/models/attention_processor.py#L164

huo-ju commented 6 months ago

@fkjkey No magic on my code, just disabled xformer ( config.enable_xformers = False ) after .load_ip_adapter()

ldtgodlike commented 6 months ago

@huo-ju I understand this, and what I mean is that pure diffusers cause this problem, not stable fast, unless the author of this repo specifically modifies it to address this issue

chengzeyi commented 6 months ago

@huo-ju Can you share your full script so that I can reproduce that?

ldtgodlike commented 6 months ago

@chengzeyi may be it could help you, I'm curious if enabling xforms in stable fast will accelerate.

import torch
from diffusers import AutoPipelineForText2Image, EulerDiscreteScheduler,ControlNetModel
from sfast.compilers.diffusion_pipeline_compiler import (
    compile, CompilationConfig)
import numpy as np
import cv2
from PIL import Image

CUDA_DEVICE = "cuda:0"
def canny_process(image,width, height):
    np_image = cv2.resize(image,(width, height))
    np_image = cv2.Canny(np_image, 100, 200)
    np_image = np_image[:, :, None]
    np_image = np.concatenate([np_image, np_image, np_image], axis=2)
    #canny_image = Image.fromarray(np_image)
    return Image.fromarray(np_image)

def reference_process(image,width, height):
    np_image = cv2.resize(image,(width, height))
    return Image.fromarray(np_image)

def load_model():
    extra_kwargs = {}
    #extra_kwargs['variant'] = variant

    controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny",
                                                 torch_dtype=torch.float16, variant="fp16",
                                                 name="diffusion_pytorch_model.fp16.safetensors", use_safetensors=True)
    extra_kwargs['controlnet'] = controlnet
    model = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5",
                                         torch_dtype=torch.float16,
                                         **extra_kwargs)
    model.scheduler = EulerDiscreteScheduler.from_config(model.scheduler.config)
    model.safety_checker = None
    model.load_ip_adapter("h94/IP-Adapter", subfolder="models",
                          weight_name="ip-adapter_sd15.safetensors")
    model.to(torch.device(CUDA_DEVICE))

    #model=compile_model(model)
    return model

def compile_model(model):
    config = CompilationConfig.Default()
    try:
        import xformers
        config.enable_xformers = True
    except ImportError:
        print('xformers not installed, skip')
    try:
        import triton
        config.enable_triton = True
    except ImportError:
        print('Triton not installed, skip')
    config.enable_cuda_graph = True

    model = compile(model, config)

    kwarg_inputs = dict(prompt=["aa"] * 4,
                      negative_prompt=[""] * 4,
                      width=768,
                      height=512,
                      num_inference_steps=20,
                      num_images_per_prompt=1,
                      guidance_scale=7.5,
                      ip_adapter_image=[np.ones((768,512,3))] * 4,
                      image=[np.ones((768,512,3))] * 4,
                      controlnet_conditioning_scale=1.0,
                      )
    for _ in range(1):
        output_image = model(**kwarg_inputs).images[0]

    return model

if __name__ == "__main__":
    control_img='control.png'
    reference_img='reference.png'

    width =768
    height=512

    canny_image = cv2.imread(control_img)[:,:,::-1]
    reference_img = cv2.imread(reference_img)[:, :, ::-1]
    canny_image=canny_process(canny_image, width, height)
    reference_img=reference_process(reference_img, width, height)

    model=load_model()
    model=compile_model(model)
    seed=-1
    batch_size=4
    generator = torch.Generator(device=CUDA_DEVICE).manual_seed(seed)
    prompt="dog"
    negative_prompt=""
    num_inference_steps=20
    guidance_scale=7.5
    controlnet_conditioning_scale=1.0
    images = model(prompt=[prompt] * batch_size,
                  negative_prompt=[negative_prompt] * batch_size,
                  width=width,
                  height=height,
                  num_inference_steps=num_inference_steps,
                  num_images_per_prompt=1,
                  guidance_scale=guidance_scale,
                  ip_adapter_image=[reference_img] * batch_size,
                  image=[control_img] * batch_size,
                  generator=generator,
                  controlnet_conditioning_scale=float(controlnet_conditioning_scale),
                  ).images
huo-ju commented 6 months ago

@chengzeyi Yes, my code is very similar to ldtgodlike's but within a big pipeline. I can make a minimal test script if his code not works.

chengzeyi commented 6 months ago

@huo-ju @ldtgodlike I run this script without any problem. So what's the detail of the exception?

See https://github.com/chengzeyi/stable-fast/blob/9fd07ce57e0cedd62ee59cd78774a577e4f2967b/community/optimize_sd15_with_controlnet_and_ip_adapter.py

ldtgodlike commented 6 months ago

@chengzeyi It seems that it can't reference the style if config.enable_triton = True reference_img ref control_img image (6) prompt="a bedroom"

if config.enable_triton = True, the result look like using controlnet only image

but if Annotate this line of code: #config.enable_triton = True, it could reference the style ip

huo-ju commented 6 months ago

ipadapter not reference the result in my testing, no error, just has the same result with none-ipadapter pipeline.

@huo-ju @ldtgodlike I run this script without any problem. So what's the detail of the exception?

See https://github.com/chengzeyi/stable-fast/blob/9fd07ce57e0cedd62ee59cd78774a577e4f2967b/community/optimize_sd15_with_controlnet_and_ip_adapter.py

Suprhimp commented 1 week ago

same to me. Is this issue solved ?