cloneofsimo / lora

Using Low-rank adaptation to quickly fine-tune diffusion models.
https://arxiv.org/abs/2106.09685
Apache License 2.0
6.94k stars 479 forks source link

Fail to resume when training with float16 #205

Open CrazyBoyM opened 1 year ago

CrazyBoyM commented 1 year ago

with the option:

--mixed_precision="fp16" \

it train faster, but when I try to resume train with

--resume_unet="/epoch_41_step_0/lora_weights/lora_e41_s0.pt" \

got error:

TypeError: cannot assign 'torch.HalfTensor' as parameter 'weight' 

is there any way to solve this problem?
I try to fix it with:

if loras is not None:
            print("########## inject from checkpoint ###########")
            _module._modules[name].lora_up.weight = torch.nn.Parameter(torch.tensor(loras.pop(0)).float().detach())
            _module._modules[name].lora_down.weight = torch.nn.Parameter(torch.tensor(loras.pop(0)).float().detach())

but fail

danieltanhx commented 1 year ago

think it's related to pytorch version issues, just modify sections of the code will do ... require_grad_params.append(_module._modules[name].lora_up.parameters()) require_grad_params.append(_module._modules[name].lora_down.parameters()) wt_tensor_type=_module._modules[name].lora_up.weight.dtype if loras != None: _module._modules[name].lora_up.weight.data = loras.pop(0).to(wt_tensor_type) _module._modules[name].lora_down.weight.data = loras.pop(0).to(wt_tensor_type) ...