Closed ignis-sec closed 12 months ago
Actually, I just checked and patch_pipe
is being used by image builder which is used for sample generation while not in training.
However, its checking maybe_unet_path.endswith(".text_encoder.pt")
which is _text_lora_path
naming while it was using _text_lora_path_ui
, which would be maybe_unet_path.endswith("_txt.pt")
.
I'm a bit confused on if i should revert that and instead change and modify save_pipe
to use _text_lora_path_ui
instead (so it can still work with patch_pipe/merge_loras_to_pipe)
In one of these cases text encoding weights from lora will not be added to the diffusion network when generating samples with ImageBuilder, and I'm not exactly sure if this fixes that as well or breaks it. Let me know and I'll modify.
Bumping as the issue is not yet fixed
Disclaimer:
I'm not at all familiar with diffusers, or with this codebase. There is probably a faster approach to this but hey, its only sampling and it works, so better than nothing.
Issue
Since late march, sample image generation during training will not work if a Lora network is being trained, and instead it will output random noise. More discussion on the issue available in issue tracker, at issue #1273.
The problem, in depth.
For sample generation,
s_pipeline
is being saved to a temporary directory and a newDiffusionPipeline
is being instantiated from the checkpoint (I'm assuming its just a quick way of releasing references to unet and text_encoder being trained)However, these models are structurally different from
UNet2DConditionModel
andCLIPTextModel
because of the following code:When a
DiffusionPipeline
is being created via thefrom_pretrained
method, iflow_cpu_mem_usage=False
anddevice_map=None
parameters are passed, it will randomly initialize the missing required state keys.Because of this, inference will output random noise.
The solution, in depth.
Solution in this case contains the following steps: 1- Before saving the existing DiffusionPipeline to disk, use the save_pipe function to save lora weights to disk. 2- If a Lora network is being trained, ignore the weight_dtype when constructing the
DiffusionPipeline
at first.^1 3- If a Lora network is being trained:DiffusionPipeline
again, re-inject trainable lora to unet and text encoder.Other fixes included, and hardest "Where is Waldo" game ever
I've also taken the liberty of tracking an extra comma that has broken save_pipe function (which was not being used, now it is). Extra comma was converting the lora txt filename to tuple and was throwing an exception when passed to torch.save/safetensors.torch.save. Additionally, I've modified the lora text encoder file name search in
patch_pipe
to use_text_lora_path(unet_path)
instead of_text_lora_path_ui(unet_path)
. Couldn't see this symbol used anywhere else as well, so i figured it would be nice to use together in this case with save_pipe in same naming convention. Lmk if i missed something and broke something else.Checklist before requesting a review