Fail to resume when training with float16

cloneofsimo / lora

Using Low-rank adaptation to quickly fine-tune diffusion models.

Apache License 2.0

6.94k stars 479 forks source link

with the option:

--mixed_precision="fp16" \

it train faster, but when I try to resume train with

--resume_unet="/epoch_41_step_0/lora_weights/lora_e41_s0.pt" \

got error:

TypeError: cannot assign 'torch.HalfTensor' as parameter 'weight'

is there any way to solve this problem?
I try to fix it with:

if loras is not None:
            print("########## inject from checkpoint ###########")
            _module._modules[name].lora_up.weight = torch.nn.Parameter(torch.tensor(loras.pop(0)).float().detach())
            _module._modules[name].lora_down.weight = torch.nn.Parameter(torch.tensor(loras.pop(0)).float().detach())

but fail

cloneofsimo / lora

Fail to resume when training with float16 #205