AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
143.63k stars 27.03k forks source link

extension installer using threadpoolexecutor #16548

Open wkpark opened 1 month ago

wkpark commented 1 month ago

Description

Notes:

Checklist:

w-e-w commented 1 month ago

feels like a bad idea I believe issues would occur if two extensions are trying to install or modify the same files

wkpark commented 1 month ago

thank you for your comment

there could be such race conditions e.g.)

so I made a simple test program:

from concurrent.futures import ThreadPoolExecutor, as_completed
import subprocess

def run_extension_installer(r):
    try:
        ret = subprocess.run(["pip", "install", "-r", r], shell=False,)
        if ret.returncode != 0:
            if ret.stdout:
                print(ret.stdout)
            if ret.stderr:
                print(ret.stderr)
    except Exception as e:
        print("Error:", e)

with ThreadPoolExecutor(max_workers=2) as executor:
    futures = [executor.submit(run_extension_installer, r) for r in ["requirements.txt", "requirements2.txt"]]

even If requrirements.txt and requirements2.txt have same package list, pip install.. started at the same time and

so, there could be some pip install... fails.

but, if requirements.txt and requirements2.txt have different packages it will run normally.

so, I choose max_workers=2 to minimize possible race conditions.

wkpark commented 1 month ago

I just woke up so my writing here is a bit scatterbrained

still think it's mostly asking for trouble with little return

from my understanding pretty much the only reason that one would wish to parallelize install.py is to speed up load time base on my system one install.py typically takes around 0.2 seconds if it's just checking if the requirements are met so at best with 2 works you cut the time in half

note if an installer of extension is taking longer than this it's likely means that it's not optimized

while install.py is mostly used for pip, (and assuming that can handle this situation perfectly without issues which I still have my doubts), some extension installers does perform other actions, it is not possible to anticipate what they might do as there's no restrictions, it is totally possible that two extensions might decide to read or write to the same file using other methods that pip

also consider another case I have two extensions specify two different version of a package, the later installer would basically override what has been done by the earlier installers when this happened you would have specific packages to be re-install on every launch, this is not ideal but webui might still work and more importantly the behavior is consistent as the install the execution order is consistent

yes. right but that is extension's fault with or without this PR I think. as you mentioned install.py and scripts/*.py have same issue. to reduce/minimize these problem we have to minimize loading time executed code.

but if you parallelize them then there's no guarantee on which will be executed first, and so and so which package gets installed is basically leave to random chance

when there is a package conflict the most likely is it conflict of the packaged "tree" (not just one package), my guess is if two installers is executed at the same time, the best case is one completely overrides the other, but which one has priority is up to a random chance, the worst case I can think of is that, they partially override each other, and this result in a higher chance of things breaking

package dependency problem frequently occured in normal use case, anyway.


Let's be honest, I was surprised to see multiple pip instances working in the same venv environment at the same time. Normally yum/rpm/apt/etc don't work at the same time. especially pip takes quite a long of time to download packages, and I think it would speed up the installation a lot if we could scan all the extensions' requirements.txt and run pip download separately in parallel. (see also https://github.com/pypa/pip/pull/12923)

Thanks for the long answer. It gives me a lot of things to think about even if PR not accepted.

w-e-w commented 1 month ago

I recall someone mentioning in Discord that we should revamp how extension installs dependencies in the first place

basically allowing extension to state their requirements and let web UI installs it dependencies as opposed to each extension installing them individually

but for obvious reasons that would require lots of work and as far as I am aware no one has put in any effort on making that happen

and even if someone could manage to create such a system what about old extensions all those have to be updated so it's work plus more work plus more work for pretty much everyone involved


this and the other PR I don't see this being a thing it's just too much complication for too less of a reward