Revisit model inference

Karonar1 / lora_viewer

MIT License

0 stars 0 forks source link

Revisit model inference #6

Closed Karonar1 closed 1 month ago

Karonar1 commented 1 month ago

Currently we distinguish 1.5 and XL LoRA UNets based on whether tensors are named down* or input*, but this is actually a characteristic of the LoRA version, not base model. In particular, it incorrectly identifies XL models trained on One Trainer as 1.5. These need to be distinguished based on more detailed examination of layer names and/or shapes.

Karonar1 commented 1 month ago

SD and SDXL UNets can be distinguished by tensor shape: SD has 320 channels for its early attention layers, while SDXL has 640 channels. I see no easy way to distinguish SD 1.5 from SD 2.1 LoRAs - neither for the UNet nor the text encoder. While SD 2 was supposed to have a different encoder, the layers affected by LoRA seem to be the same shape.

Karonar1 commented 1 month ago

Some types of adapter can be distinguished by tensor name alone, looking at the end of the name:

LoRA names end lora_down.weight and lora_up.weight
DoRA has the same tensor names as LoRA, but also adds dora_scale
LoHa names end hada_w1_a, hada_w1_b, hada_w2_a, hada_w2_b
LoKr names end lokr_w1 and lokr_w2 LyCORIS is more of a framework for training low-rank models. It may be able to train additional layers, but doesn't have any consistent clear differences from the actual adapter types and may contain any of them.

Karonar1 commented 1 month ago

We can now detect a number of LoRA subtypes. The distinction between SD and SDXL UNet LoRAs has been removed, as it never worked very well, and providing incorrect results is bad. They can still be easily distinguished by the text encoder part.