what's the recommended number of training images should i be using for Lora SDXL

MaxTran96 commented 1 year ago

Hi, i tried the runpod but it doesn't recommend the number of training image i should be using to get a decent result in Lora SDXL. Do you have any suggestion on the recommended number of training image and training step?

TheLastBen commented 1 year ago

for a style, you can get good result for a dataset starting from 5 images, the dataset must be consistent in style though.

MaxTran96 commented 1 year ago

What about for faces? Is 5 images good enough? What do you recommend for the training step for 5 training image for style and for face? Thank you!

Also i got training to run! Is there a code somewhere that i can use to load the safetensor generated from the training to use for inference?

I only found this resource: https://ngwaifoong92.medium.com/how-to-fine-tune-sdxl-0-9-using-dreambooth-lora-2011571cf157 which is loading the pytorch_lora_weights.bin instead of the .safetensors file like in ComfyUI. Do you have the inference code available in your repo?

MaxTran96 commented 1 year ago

Hi, do you know if we should use the latest version of accelerate or the 0.12? I got this error popped up File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/utils/versions.py", line 44, in _compare_versions raise ImportError( ImportError: accelerate>=0.20.3 is required for a normal functioning of this module, but found accelerate==0.12.0.

TheLastBen commented 1 year ago

install transformers version 4.25.1

MaxTran96 commented 1 year ago

Yea i got the training to run and when i use ComfyUI image, it was able to generate but when i tried running inference using this code, it doesn't seem to work. Do you by any chance know how can i load the lora .safetensors weight into the pipeline to run inference?

i tried

def load_lora_weights(pipeline, checkpoint_path, multiplier, device, dtype):
      LORA_PREFIX_UNET = "lora_unet"
      LORA_PREFIX_TEXT_ENCODER = "lora_te"
      # load LoRA weight from .safetensors
      state_dict = load_file(checkpoint_path, device=device)
      updates = defaultdict(dict)
      for key, value in state_dict.items():
          # it is suggested to print out the key, it usually will be something like below
          # "lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight"
          layer, elem = key.split('.', 1)
          updates[layer][elem] = value
      # directly update weight in diffusers model
      for layer, elems in updates.items():
          if "text" in layer:
              layer_infos = layer.split(LORA_PREFIX_TEXT_ENCODER + "_")[-1].split("_")
              curr_layer = pipeline.text_encoder
          else:
              layer_infos = layer.split(LORA_PREFIX_UNET + "_")[-1].split("_")
              curr_layer = pipeline.unet
          # find the target layer
          temp_name = layer_infos.pop(0)
          while len(layer_infos) > -1:
              try:
                  curr_layer = curr_layer.__getattr__(temp_name)
                  if len(layer_infos) > 0:
                      temp_name = layer_infos.pop(0)
                  elif len(layer_infos) == 0:
                      break
              except Exception:
                  if len(temp_name) > 0:
                      temp_name += "_" + layer_infos.pop(0)
                  else:
                      temp_name = layer_infos.pop(0)
          # get elements for this layer
          weight_up = elems['lora_up.weight'].to(dtype)
          weight_down = elems['lora_down.weight'].to(dtype)
          alpha = elems['alpha']
          if alpha:
              alpha = alpha.item() / weight_up.shape[1]
          else:
              alpha = 1.0
          # update weight
          if len(weight_up.shape) == 4:
              curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up.squeeze(3).squeeze(2), weight_down.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3)
          else:
              curr_layer.weight.data += multiplier * alpha * torch.mm(weight_up, weight_down)
      return pipeline

dtype = torch.float16
variant = 'fp16'
STABLE_DIFFUSION_SDXL = 'stabilityai/stable-diffusion-xl-base-0.9'
pipe = DiffusionPipeline.from_pretrained(
    STABLE_DIFFUSION_SDXL,
    torch_dtype=dtype,
    use_safetensors=True,
    safety_checker=None,
    variant=variant
).to('cuda')
pipe = load_lora_weights(pipe, lora_path, 1.0, 'cuda', torch.float16)

it doesn't generate the image of the subject that i finetuned the model with

TheLastBen / fast-stable-diffusion

what's the recommended number of training images should i be using for Lora SDXL #2374