Open StylusEcho opened 9 months ago
So I followed the traceback, and found that the script was looking for /extensions-builtin/Lora/network_lora.py
I don't see it in Forge... so I go ahead and copy/paste it from A1111. I restart Forge and next it is missing /extensions-builtin/Lora/lyco_helpers.py so again, I go ahead and copy/paste it from A1111.
EDIT Nevermind, there are no errors on launch but the extension is not actually working.
I would also love to see this working in forge, and also working with adetailer because adetailer doesn't seem to recognize the loractl syntax.
Unfortunately, loractl is very intimately tied to the innards of how A1111 implements LoRAs. The concept could certainly be applied in other tools like Forge or ComfyUI, but the actual implementation is likely to be wildly different depending on the innards of their lora apply code. It would better for a loractl-like extension for each project to be its own distinct project, just because there would be very little actual shared code between them. In many ways, this is less of a pure "extension" and more of a ugly monkeypatch on top of A1111's specific codebase.
A1111 makes this fairly easy because it already has support for variant lora weights (as do most), and it runs the lora network weight setup process with each step -- the trick is that if the lora being changed haven't changed since the previous step, it doesn't have to do the expensive work of reverting to the base model weights and recomputing the lora weights to apply.
loractl hijacks that process and invalidates A1111's cache, and "lies" about what the weight should be, by dynamically computing it for each step. This just lets the native A1111 lora application code do the rest.
A1111 also supports a generalized "extra networks syntax", which is what allows this extension to cleanly replace the existing
The extension is just mutating A1111's internal mechanisms, and heavily relies on the specifics of their implementation. This is why it's not particularly viable to just tweak it for other platforms - the extension would have to be fundamentally rewritten for each of them.
It's been a while since I've looked at this stuff, but glancing over Forge's code, it looks like it's using something closer to the pre-1.5 A1111 implementation of loras - which this extension will specifically not work with, since it depends on the 1.5 rewrite of lora handling. You might try the composable lora extension, which this one takes its inspiration from, and which worked with the pre-1.5 A1111 webui.
That said, I highly welcome and encourage forks or derivations of this project for platforms like Forge or ComfyUI. I haven't done very much at all with Stable Diffusion in the last few months, and am somewhat out of the loop regarding the developments in the space over that period, and I don't have a lot of time right now to put towards new similar projects, but I certainly hope that someone else will take the idea and run with it.
@cheald What changed in the rewrite and did forge truly stick with the older version? What exactly are the implementation-specific details?
I wrote up more detail here: https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/68#issuecomment-1945491969
tl;dr Forge appears to set up lora weights once at the start of rendering (which is intelligent, since it's slow and doesn't usually change mid-render), but since it doesn't have hooks for recomputing lora weights per step, loractl's approach can't work.
loractl works by invaliding A1111's lora cache, which causes the A1111 lora code to reapply new lora weights on each step. This involves restoring the base weights and then recomputing the new weights with the lora weights times some weight multiplier. This is why loractl slows down rendering as much as it does - this isn't a quick process, though I'm sure that the Forge authors can probably find ways to make it a bit more optimized.
PLS
@cheald It's been a long time... I am probably in slightly over my head, but I'm on a mission to try implementing loractl as a forge builtin extension, meaning that no monkey patching would be needed, and forge code may be adjusted to accommodate.
I believe that Illyasviel would approve such a PR for forge, so long as its current logic for LORA handling is applied when the box is un-checked - and if the box is checked, it then performs per-step calculation / application.
I've read this comment of yours (lllyasviel/stable-diffusion-webui-forge/issues/68) probably 100 times now today, while looking at the relevant code in Forge, A1111, and your extension, and although I have a pounding headache, from what I can tell you were indeed very concise and accurate with everything you wrote.
https://github.com/altoiddealer/stable-diffusion-webui-forge-altoids/tree/integrate_loractl
extensions-builtin
, renamed it to be uniform with forgelora_ctl_network.py
.
lora_path
to account for extensions-builtin.Lora
being extensions-builtin.sd_forge_lora
in Forgenetwork_lora
import which, as far as I can tell, was not even being used by loractl in A1111?I can tell already that the code is executing and calculating weights for every step during inference, exactly as it does in A1111.
So now, I believe it's just a matter of getting Forge to re-load the networks every iteration?
(Edit: After looking at this, it looks like Forge only attempts to set up networks once per image, rather than once per step, so loractl as it's implemented in A1111 would not work)
I have not been able to pinpoint exactly where this logic diverges between A1111 and Forge. Any further assistance you could provide would be immensely appreciated.
Thanks!
Some notes...
From a quick gander at the Forge network handling, te/unet weights are calculated during network activation, but rather than being taken from overridable properties, they're taken directly from the params parameter passed to activate
The extra_networks_lora
module is identical between A1111 and Forge - including activate()
and subsequently the time of execution for networks.load_networks()
I can see that networks.load_networks()
is drastically different between A1111 and Forge
I haven't looked at your implementation yet (or at forge recently), but if I were implementing loractl as a native function, I'd do it by tracking and modifying network weight deltas rather than just blowing it all away and reloading it per step.
That is, right now the A1111 pseudocode is:
for each step:
if network weights have changed:
restore base weights from a backup copy
add each network's weights times its application weight to the base weights
run inference
The network weight backup/restore is a lot of data moved around. Instead, if you just did something like:
for each step:
if network weights have changed:
subtract each network's *previous* effective weights from the accumulated weights
add each network's *current* effective weights to the accumulated weights
run inference
This should run quite a lot faster, just because the amount of data shuttled back and forth should be substantially lower.
Or, as an alternate, rather than merging network weights into base weights, you could just build up a chain of networks and have each Lora's forward()
method run with (frozen) base weights + dynamic lora weights, just like during training. This would incur extra processing overhead (since you'd be doing extra matmuls per inference step), but it would come at the benefit of avoiding a bunch of memory shuffling, which could end up to be faster on the net overall.
When I last looked at forge, load_networks
happened before entering the step loop, whereas it happens inside the step loop for A1111. That was why it wasn't easy to implement in Forge - without a hook to be able to easily cause recomputation of the unet/te weights per step, it's a non-starter. However, if Forge now has some capacity to alter the weights during each step, then the technique should be viable.
Thank you very much for the reply. "My implementation" is basically nothing, so I chuckled a bit at that :)
I'm going to read this reply 100 times or so while contemplating Forge code, and see if I can make heads or tails of it.
Thank you very much for the reply. "My implementation" is basically nothing, so I chuckled a bit at that :)
I'm going to read this reply 100 times or so while contemplating Forge code, and see if I can make heads or tails of it.
@altoiddealer is it now possible for the Dynamic Weights Controlle extension to work now on forge? i wanna try it so bad in forge cause it's a game changer for loras tbh especially as some of us work a lot with multiple lora, i hope for the good news :)
I’m trying, but I’m probably not the guy who can make it happen. It’s better than no one trying, so wish me luck!
I’m trying, but I’m probably not the guy who can make it happen. It’s better than no one trying, so wish me luck!
thanks for ur hard work and ur determination buddy, i hope it can be integrated into forge asap cause i'm sure with this extension many lora problems will be solved especially when using an over-trained lora or multiple one's!
Yes, I'm in the same boat, been praying for this to be implemented for well over a year now.
Don't hold your breath, if I can actually get this to work it likely won't be in just a few days
Hi, lllyasviel released a version of Automatic1111 called Forge. Currently, loractl doesn't work with it. I get the following:
However, lllyasviel claims Forge makes extension development easier and more efficient. Could you please check out Forge and see what you can do? Thank you.