Open magickaito opened 1 year ago
similar problem. when PTI: Before training, loss becom nan.
same issue here, but I use sd model v2.1. The wired thing is that I use inject_trainable_lora into sd model with target_instead_model=["CrossAttention"] and return no parameters. For more details, my code is as follow:
from diffusers import UNet2DConditionModel
from lora_diffusion import inject_trainable_lora
unet2d = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-1", subfolder="unet")
params_1, name = inject_trainable_lora(unet2d, {"CrossAttention"}, verbose=True, r=4, scale=1.0)
print(params_1)
Anyone meet the same problems?
Hi, everyone!
I just find my issue is by caused by data_type!!!!
My code is as follow and I hope this can help any of you meet the same problem.
For the convenient usage, I rewrite the function inject_trainable_lora
as
def inject_trainable_lora(
model: nn.Module,
target_replace_module: Set[str] = DEFAULT_TARGET_REPLACE,
r: int = 4,
loras=None, # path to lora .pt
verbose: bool = False,
dropout_p: float = 0.0,
scale: float = 1.0,
):
"""
inject lora into model, and returns lora parameter groups.
"""
# 👉 store parameters in ModuleList
require_grad_params = torch.nn.ModuleList()
if loras != None:
loras = torch.load(loras)
for _module, name, _child_module in _find_modules(
model, target_replace_module, search_class=[nn.Linear]
):
weight = _child_module.weight
bias = _child_module.bias
if verbose:
print("LoRA Injection : injecting lora into ", name)
print("LoRA Injection : weight shape", weight.shape)
_tmp = LoraInjectedLinear(
_child_module.in_features,
_child_module.out_features,
_child_module.bias is not None,
r=r,
dropout_p=dropout_p,
scale=scale,
)
_tmp.linear.weight = weight
if bias is not None:
_tmp.linear.bias = bias
# switch the module
_tmp.to(_child_module.weight.device).to(_child_module.weight.dtype)
_module._modules[name] = _tmp
# 👉 append lora layer
require_grad_params.append(_module._modules[name].lora_up)
require_grad_params.append(_module._modules[name].lora_down)
if loras != None:
_module._modules[name].lora_up.weight = loras.pop(0)
_module._modules[name].lora_down.weight = loras.pop(0)
_module._modules[name].lora_up.weight.requires_grad = True
_module._modules[name].lora_down.weight.requires_grad = True
return require_grad_params
In this way, we could add lora parameters into optimizer more easily as
from diffusers import UNet2DConditionModel
from lora_diffusion import inject_trainable_lora
unet2d = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-1", subfolder="unet")
params_1 = inject_trainable_lora(unet2d, {"UNet2DConditionModel"}, verbose=True, r=4, scale=1.0)
optim = torch.optim.AdamW(params_1.parameters(), lr=0.0001)
If you have issue like loss = nan
, pls check the data type and there might be a mixture of using both torch.float32
and torch.float16
. And you need to set data type to torch.float32
!!!
Hi guys I am using this colab notebook by pedrogengo
For unknown reasons, I keep getting nan loss during training. This happens whenever the training steps is higher than 500. If the steps is 500. It appears ok (but too low to be usable)
This happened on both my copy of google colab and a hosted runpod pytorch container with 24GB graphics memory.
These are the configurations:
Nothing much changed.
There are the output that shows loss becoming nan during the training:
What could be wrong here?
And if it helps, these are the output during the first installation step: