Closed rhpvorderman closed 4 years ago
I also tackled the very inefficient search mechanism. Now it makes a set of installed repos and compares against the set. Instead of looping over the list of installed tools. This should create some speedup on a server with thousands of tools installed.
Absolutely, I'll try to do that this evening! Thanks so much for tackling this
We live dangerously so it's running in prod. https://build.galaxyproject.eu/job/usegalaxy-eu/job/install-tools/117/console Hopefully it works!
It has to call the galaxy api quite a lot still. So if it is slow I can think of another way to speed it up... But that will require some more work. I hope this is sufficient.
Hmm still running for an hour... It's a bit annoying that it is not more verbose..
This seems to have taken even longer, but not sure if that's something on our end? We can keep trying. The build took 12 hours and was killed(ish) at midnight by an automated process.
@erasche. There was no way to check if this workaround was faster by putting it to a real production test. I guess the API call for each of the installed tools takes way too much time. This is the only way to determine if galaxy is actually going to skip the tool. But this check probably is just as slow as actually trying to install a new tool. I am affraid it is not really possible to workaround around the slowness of the API in that case.
I removed the api check. By using sets to determine if a tool is already installed this should speed up the process a bit, given the quite big number of tools on the galaxy server.
There was no way to check if this workaround was faster by putting it to a real production test.
Oh, all good. We're happy to be that test :)
@erasche. I think this problem is not solvable unless the galaxy api is fixed. Deducing whether galaxy is going to install a tool or not requires an API call for each tool on the galaxy server, which is extremely slow. Alternatively the list of all available repos on the toolshed can be downloaded to make a list of installable revisions. If a galaxy tool is not an installable revision the API call method can be used. This might save some time, but I doubt it. It also introduces the overhead of having to download the repolists from each toolshed (~10 mb of data) for each installed yaml. Which is not worth it.
Nevertheless, comparing using sets should save some time on the install. So that change can be kept from this PR.
@mvdbeek can you have a look at this PR? It does a set comparision to check if stuff is already installed. As such it should speedup the install scripts on big galaxies with multiple tools installed.
Well, this doesn't optimise the bottleneck and introduces more code. If we do want to optimise the code (I don't think we should) I'd just build sets of tuples up front and use the asymmetric difference.
Close this as it does not solve the bottleneck as @mvdbeek stated.
Works around #140 @erasche , can you run this on the EU galaxy tool lists and report your findings? (Maybe make an image of the server first... All the testing succeeds but... you never know.).