Open MaxTran96 opened 1 year ago
for a style, you can get good result for a dataset starting from 5 images, the dataset must be consistent in style though.
What about for faces? Is 5 images good enough? What do you recommend for the training step for 5 training image for style and for face? Thank you!
Also i got training to run! Is there a code somewhere that i can use to load the safetensor generated from the training to use for inference?
I only found this resource: https://ngwaifoong92.medium.com/how-to-fine-tune-sdxl-0-9-using-dreambooth-lora-2011571cf157 which is loading the pytorch_lora_weights.bin instead of the .safetensors file like in ComfyUI. Do you have the inference code available in your repo?
Hi, do you know if we should use the latest version of accelerate or the 0.12? I got this error popped up File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/versions.py", line 44, in _compare_versions raise ImportError( ImportError: accelerate>=0.20.3 is required for a normal functioning of this module, but found accelerate==0.12.0.
install transformers version 4.25.1
Yea i got the training to run and when i use ComfyUI image, it was able to generate but when i tried running inference using this code, it doesn't seem to work. Do you by any chance know how can i load the lora .safetensors weight into the pipeline to run inference?
i tried
def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
LORA_PREFIX_UNET = "lora_unet"
LORA_PREFIX_TEXT_ENCODER = "lora_te"
# load LoRA weight from .safetensors
state_dict = load_file(checkpoint_path, device=device)
updates = defaultdict(dict)
for key, value in state_dict.items():
# it is suggested to print out the key, it usually will be something like below
# "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
layer, elem = key.split('.', 1)
updates[layer][elem] = value
# directly update weight in diffusers model
for layer, elems in updates.items():
if "text" in layer:
layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
curr_layer = pipeline.text_encoder
else:
layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
curr_layer = pipeline.unet
# find the target layer
temp_name = layer_infos.pop(0)
while len(layer_infos) > -1:
try:
curr_layer = curr_layer.__getattr__(temp_name)
if len(layer_infos) > 0:
temp_name = layer_infos.pop(0)
elif len(layer_infos) == 0:
break
except Exception:
if len(temp_name) > 0:
temp_name += "_" + layer_infos.pop(0)
else:
temp_name = layer_infos.pop(0)
# get elements for this layer
weight_up = elems['lora_up.weight'].to(dtype)
weight_down = elems['lora_down.weight'].to(dtype)
alpha = elems['alpha']
if alpha:
alpha = alpha.item() / weight_up.shape[1]
else:
alpha = 1.0
# update weight
if len(weight_up.shape) == 4:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
else:
curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
return pipeline
dtype = torch.float16
variant = 'fp16'
STABLE_DIFFUSION_SDXL = 'stabilityai/stable-diffusion-xl-base-0.9'
pipe = DiffusionPipeline.from_pretrained(
STABLE_DIFFUSION_SDXL,
torch_dtype=dtype,
use_safetensors=True,
safety_checker=None,
variant=variant
).to('cuda')
pipe = load_lora_weights(pipe, lora_path, 1.0, 'cuda', torch.float16)
it doesn't generate the image of the subject that i finetuned the model with
Hi, i tried the runpod but it doesn't recommend the number of training image i should be using to get a decent result in Lora SDXL. Do you have any suggestion on the recommended number of training image and training step?