[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui
What happened?
Thank you for this project.
The results of using IPAdapter with diffusers that I implemented myself are much worse compared to the results of calling ControlNet IPAdapter as you described. I'm not sure why. Are there some special processes here that I haven't noticed?
Also, I'd like to know how to accelerate IPAdapter, such as using TensorRT or Stable-Fast. Since each face ID is different, can't frameworks that require compilation (like TensorRT) be used for acceleration? I feel that the time it takes to call ControlNet IPAdapter is too long, and it seems like ControlNet cache isn't working. Inference using diffusers only takes 5 seconds, while ControlNet IPAdapter takes 16 seconds. Do you have any good suggestions for accelerating calculations with ControlNet IPAdapter?I hope to accelerate when calling through the API because my pipeline is fixed.
Steps to reproduce the problem
diffusers code:
import time
import torch
from diffusers import StableDiffusionXLPipeline, DDIMScheduler
from diffusers.utils import load_image
from insightface.app import FaceAnalysis
from insightface.utils import face_align
import cv2
import numpy as np
import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler
from transformers import CLIPVisionModelWithProjection
from diffusers.utils import load_image
# 弄出图片的 embeddings 模型
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
"/ssd/xiedong/stable-fast/CLIP-ViT-H-14-laion2B-s32B-b79K",
torch_dtype=torch.float16,
)
noise_scheduler = DDIMScheduler(
num_train_timesteps=1000,
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
clip_sample=False,
set_alpha_to_one=False,
steps_offset=1,
)
# 加载 pipeline
pipeline = AutoPipelineForText2Image.from_pretrained(
"/ssd/xiedong/stable-fast/portrait_sdxl1.0_finetune-000029",
torch_dtype=torch.float16,
image_encoder=image_encoder,
scheduler=noise_scheduler,
safety_checker=None
).to("cuda")
tim1 = time.time()
pipeline.load_ip_adapter("/ssd/xiedong/stable-fast/IP-Adapter/IP-Adapter-FaceID",
subfolder=None,
weight_name="ip-adapter-faceid-plusv2_sdxl.bin",
image_encoder_folder=None)
pipeline.set_ip_adapter_scale(1)
pipeline.load_lora_weights("/ssd/xiedong/stable-fast/IP-Adapter/IP-Adapter-FaceID",
weight_name="ip-adapter-faceid-plusv2_sdxl_lora.safetensors")
pipeline.fuse_lora(lora_scale=0.5)
image = load_image("./huge.jpg")
num_images = 1
ref_images_embeds = []
ip_adapter_images = []
app = FaceAnalysis(root="/ssd/xiedong/stable-fast/insightface", name="buffalo_l",
providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
faces = app.get(image)
ip_adapter_images.append(face_align.norm_crop(image, landmark=faces[0].kps, image_size=224))
image = torch.from_numpy(faces[0].normed_embedding)
ref_images_embeds.append(image.unsqueeze(0))
ref_images_embeds = torch.stack(ref_images_embeds, dim=0).unsqueeze(0)
neg_ref_images_embeds = torch.zeros_like(ref_images_embeds)
id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda")
clip_embeds = \
pipeline.prepare_ip_adapter_image_embeds([ip_adapter_images], None, torch.device("cuda"), num_images, True)[0]
# clip_embeds shape
print(f"clip_embeds shape: {clip_embeds.shape}")
# id_embeds shape
print(f"id_embeds shape: {id_embeds.shape}")
pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16)
pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = True # True if Plus v2
generator = torch.Generator(device="cpu").manual_seed(42)
images = pipeline(
prompt="In a snowy mountain range, the young man is dressed in winter attire, facing the camera with a determined gaze. He sports a thick wool coat, knit hat, and gloves to keep warm in the frigid temperatures. His eyes, piercing and resolute, reflect the strength and resolve needed to conquer the elements and the challenging terrain.",
ip_adapter_image_embeds=[id_embeds],
negative_prompt="paintings, sketches, worst quality, low quality, normal quality, lowres, blurry, text, logo, monochrome, grayscale, skin spots, acnes, skin blemishes, age spot, strabismus, wrong finger, bad anatomy, bad hands, error, missing fingers, cropped, jpeg artifacts, signature, watermark, username, dark skin, fused girls, fushion, bad feet, ugly, pregnant, vore, duplicate, morbid, mutilated, transexual, hermaphrodite, long neck, mutated hands, poorly drawn face, mutation, deformed, bad proportions, malformed limbs, extra limbs, cloned face, disfigured, gross proportions, missing arms, missing legs, extra arms, extra legs, plump, open mouth, tooth, teeth, nsfw,",
num_inference_steps=30,
num_images_per_prompt=1,
width=1024,
height=1024,
generator=generator
).images
tim2 = time.time()
print(tim2 - tim1)
images[0].save("output1.png")
Is there an existing issue for this?
What happened?
Thank you for this project.
The results of using IPAdapter with diffusers that I implemented myself are much worse compared to the results of calling ControlNet IPAdapter as you described. I'm not sure why. Are there some special processes here that I haven't noticed?
Also, I'd like to know how to accelerate IPAdapter, such as using TensorRT or Stable-Fast. Since each face ID is different, can't frameworks that require compilation (like TensorRT) be used for acceleration? I feel that the time it takes to call ControlNet IPAdapter is too long, and it seems like ControlNet cache isn't working. Inference using diffusers only takes 5 seconds, while ControlNet IPAdapter takes 16 seconds. Do you have any good suggestions for accelerating calculations with ControlNet IPAdapter?I hope to accelerate when calling through the API because my pipeline is fixed.
Steps to reproduce the problem
diffusers code:
What should have happened?
As mentioned above.
Commit where the problem happens
webui: controlnet:
What browsers do you use to access the UI ?
No response
Command Line Arguments
List of enabled extensions
As mentioned above.
Console logs
Additional information
No response