Open wkpark opened 1 month ago
feels like a bad idea I believe issues would occur if two extensions are trying to install or modify the same files
thank you for your comment
there could be such race conditions e.g.)
so I made a simple test program:
from concurrent.futures import ThreadPoolExecutor, as_completed
import subprocess
def run_extension_installer(r):
try:
ret = subprocess.run(["pip", "install", "-r", r], shell=False,)
if ret.returncode != 0:
if ret.stdout:
print(ret.stdout)
if ret.stderr:
print(ret.stderr)
except Exception as e:
print("Error:", e)
with ThreadPoolExecutor(max_workers=2) as executor:
futures = [executor.submit(run_extension_installer, r) for r in ["requirements.txt", "requirements2.txt"]]
even If requrirements.txt
and requirements2.txt
have same package list, pip install..
started at the same time and
pip install
started norally but other pip install
exit with OSError
and it means pip have locking mechanism to prevent overwrite the same file.so, there could be some pip install...
fails.
but, if requirements.txt
and requirements2.txt
have different packages it will run normally.
so, I choose max_workers=2
to minimize possible race conditions.
I just woke up so my writing here is a bit scatterbrained
still think it's mostly asking for trouble with little return
from my understanding pretty much the only reason that one would wish to parallelize install.py is to speed up load time base on my system one install.py typically takes around 0.2 seconds if it's just checking if the requirements are met so at best with 2 works you cut the time in half
note if an installer of extension is taking longer than this it's likely means that it's not optimized
while install.py is mostly used for pip, (and assuming that can handle this situation perfectly without issues which I still have my doubts), some extension installers does perform other actions, it is not possible to anticipate what they might do as there's no restrictions, it is totally possible that two extensions might decide to read or write to the same file using other methods that pip
also consider another case I have two extensions specify two different version of a package, the later installer would basically override what has been done by the earlier installers when this happened you would have specific packages to be re-install on every launch, this is not ideal but webui might still work and more importantly the behavior is consistent as the install the execution order is consistent
yes. right but that is extension's fault with or without this PR I think.
as you mentioned install.py
and scripts/*.py
have same issue. to reduce/minimize these problem we have to minimize loading time executed code.
but if you parallelize them then there's no guarantee on which will be executed first, and so and so which package gets installed is basically leave to random chance
when there is a package conflict the most likely is it conflict of the packaged "tree" (not just one package), my guess is if two installers is executed at the same time, the best case is one completely overrides the other, but which one has priority is up to a random chance, the worst case I can think of is that, they partially override each other, and this result in a higher chance of things breaking
package dependency problem frequently occured in normal use case, anyway.
Let's be honest, I was surprised to see multiple pip
instances working in the same venv
environment at the same time. Normally yum/rpm/apt/etc don't work at the same time.
especially pip takes quite a long of time to download packages, and I think it would speed up the installation a lot if we could scan all the extensions' requirements.txt
and run pip download
separately in parallel. (see also https://github.com/pypa/pip/pull/12923)
Thanks for the long answer. It gives me a lot of things to think about even if PR not accepted.
I recall someone mentioning in Discord that we should revamp how extension installs dependencies in the first place
basically allowing extension to state their requirements and let web UI installs it dependencies as opposed to each extension installing them individually
but for obvious reasons that would require lots of work and as far as I am aware no one has put in any effort on making that happen
and even if someone could manage to create such a system what about old extensions all those have to be updated so it's work plus more work plus more work for pretty much everyone involved
this and the other PR I don't see this being a thing it's just too much complication for too less of a reward
Description
run_extension_installer()
usingThreadPoolExecutor()
Notes:
args.skip_prepare_environment
option. so,install.py
should have independent behavior of its execution order.max_workers=1
to use normal ordered install.Checklist: