Improving startup time of the full-stack image

danielhollas commented 6 months ago

Issue for @unkcpz 's investigation into a startup time of AiiDAlab QEApp container.

I was curious and ran a few tests on my machine with the latest aiidalab/full-stack:edge image based on aiida-core 2.5.1

$ docker pull docker.io/aiidalab/full-stack:edge
$ time docker run --rm docker.io/aiidalab/full-stack:edge
... Ctrl-C
real    0m34.920s
user    0m0.022s
sys 0m0.018s

So on my machine the whole startup takes around 30s. Not great not terrible, but I can imagine on slower machines this might take significantly longer. I also tried to run just the PGSQL setup script.

$ time docker run --rm  docker.io/aiidalab/full-stack:edge  bash -c "bash /usr/local/bin/before-notebook.d/20_start-postgresql.sh"
...
real    0m9.120s
user    0m0.019s
sys 0m0.014s

and the same script followed by prepare-aiida script

time docker run --rm  docker.io/aiidalab/full-stack:edge  bash -c "bash /usr/local/bin/before-notebook.d/20_start-postgresql.sh ; bash /usr/local/bin/before-notebook.d/40_prepare-aiida.sh"
real    0m24.679s
user    0m0.026s
sys 0m0.014s

So it seems that majority of time is spent in these two scripts. pgsql setup itself takes around a third (10s) of total time, not sure how much we can do about that. Another 14s is spent in 40_prepare-aiida.sh.

@unkcpz could you run the same on the demo server? Would be good to see how this depends on the machine.

cc @giovannipizzi

giovannipizzi commented 6 months ago

Thanks! What does the pgsql setup do? Maybe just 1. Doing the setup once, 2. Turning off psql, 3. Tarring the psql internal folder once would work, and at startup you just untar it?

danielhollas commented 6 months ago

The script is here: https://github.com/aiidalab/aiidalab-docker-stack/blob/main/stack/base-with-services/before-notebook.d/20_start-postgresql.sh

As far as I can see, it runs two commands:

initdb -D /home/${NB_USER}/.postgresql
pg_ctl --timeout=180 -w -D /home/${NB_USER}/.postgresql -l /home/${NB_USER}/.postgresql/logfile start

I don't know their relative timing, but the second is the startup command right? So we can't get rid of that one for sure. But I'd really like to see the timings from the DEMO server first before we jump to solutions here.

unkcpz commented 6 months ago

I'll test this with aiida-core==2.5.0 with @danielhollas's improvement on verdi commands.

unkcpz commented 6 months ago

The full-stack already takes ~30s on the local machine which is not fast enough for the MC aiida archive inspect. For the QeApp, the problem is much serious although it only happens the first time the user persistent volume was created. The slow processes include:

Initialize the DB and prepare the aiida profile.
stop and start the daemon and run DB migration if needed.
Install the qe from conda forge (It is already be optimized by installed in the build phase and move to the home folder to accelerate, so this part takes minimal amount of time).
Setup the codes for 15 QE codes which takes ~20s in my machine. It in principle should be improved by using aiida-core>=2.5 but seems not. See my profiling on this part at https://github.com/aiidalab/aiidalab-docker-stack/pull/424#issuecomment-2082312817
Download and setup pseudos using aiida-pseudo it already optimized to do this without download from internet by using the tar file pre-download from the MC archive and setup with aiida-pseudo install sssp --from-download, but it is still slow since unzip and import 100 * 4 pseduos in to four groups are not fast.

(@danielhollas, I didn't do a detail profiling on demo server, since it is deployed with k8s directly so I have no where to run dockre run. From what happens with docker run --rm aiidalab/qe:latest, you can have a clear feel how much time spend for the steps above.)

If full-stack start takes 30s is not slow in your sense, it is a problem for the QeApp from my point view. The solution I was thinking is prepare in the build phase of image, install and write files to the $HOME, this makes docker use of the image (i.e. aiidalab-launch) waits no time. Tricky part is for k8s deployment, which will not mirror the $HOME into the persistent volume, so we need to prepare the $HOME before hand and use https://z2jh.jupyter.org/en/2.0.0/jupyterhub/customizing/user-environment.html#about-user-storage-and-adding-files-to-it to do it explicitly for the k8s spawner. It depends on how large is the home folder after build phase, we see if we should compress it to not make a image that is too large.

@danielhollas @giovannipizzi do you see any problem with the plan or have better solution for it?

giovannipizzi commented 6 months ago

Hi, one option (maybe is the same you are thinking about) is

Create the image
Run once the startup scripts, somewhere. This might take minutes but will end up with an aiida profile (db+repo) which includes codes, pseudos,...)
Run the new verdi backup command (or equivalent commands, in reality if the daemon is shut down, you don't need the complex logic in there, but just dump the psql DB and copy the repo folder and the config.json
In the jupyter hooks on the link that Jason sent, first check in the fastest way possible if this is the first startup (simplest approach: at the end of a successful startup as described below, you also touch a file), you run a script to recover the profile (psql DB, put the disk objectstore repo in place, and adapt accordingly the config.json, and finally touch the file I mention above). Otherwise, do nothing.

It would be good to check how long 4 takes (and if zipping helps, I am not convinced, maybe tarring is enough). Maybe there are still some steps that cannot be skipped with this approach?

danielhollas commented 6 months ago

The full-stack already takes ~30s on the local machine which is not fast enough for the MC aiida archive inspect. For the QeApp, the problem is much serious although it only happens the first time the user persistent volume was created.

I agree that 30s is not ideal, I am just really worried about introducing a lot of complexity to this repo (which is already complex).

If I understand correctly (@giovannipizzi correct me if I am wrong), the main concern for now is the demo server, we shouldn't worry about aiidalab-launch users for now.

Since that's done through kubernetes, simply modifying the base image and adding stuff to $HOME will not work anyway. So as a first step, I would suggest for you to focus on demo server and as @giovannipizzi create a package that you can copy to the home volume. (I am not familiar with kubernetes but surely this is not an uncommon problem to have a pre-populated mount-point.

Sorry in case I am misunderstanding something.

unkcpz commented 6 months ago

the main concern for now is the demo server, we shouldn't worry about aiidalab-launch users for now.

I agree, and that's the reason I never bring this to this repo but want to tackle it for the QEApp image first. I'll still do it from aiidalab-qe fro the moment where we can have a faster iteration on development.

Meanwhile, in order to avoid bring too much complicity to the image preparation (as @danielhollas pointed out, it is already quite complex), if there is less different between backup and run the rsync at full home directly, I am prone to go with the simple solution. But worth to try both and had a clear comparison between the size of the final image and the speed improvement.

danielhollas commented 6 months ago

Thanks. To clarify my suggestion, if we care about the demo server, I'd not touch the images at all, and instead try modify the kubernetes startup to inject the data there. But I might be underestimating the complexity of doing that.

unkcpz commented 6 months ago

and instead try modify the kubernetes startup to inject the data there. But I might be underestimating the complexity of doing that.

Haha, I think about it as well, I may over estimate the complexity such as the permission of the system. Will keep this in mind.

danielhollas commented 6 months ago

Setup the codes for 15 QE codes which takes ~20s in my machine. It in principle should be improved by using aiida-core>=2.5 but seems not. See my profiling on this part at https://github.com/aiidalab/aiidalab-docker-stack/pull/424#issuecomment-2082312817

Just a note, aiida 2.5 is unlikely to help too much here. Much of the gain I got were concentrated on verdi tab completion and commands not accessing the database. Other gains were partly negated by the introduction of pydantic. I did some timings and in your case you're paying a price of at least 0.5s for each verdi invocation. The main gain here would be to create the codes via python API from within the same process. (@superstar54 mentioned there were some threading issues there, but those should be surmountable, e.g. by having a small python script that sets up the codes and is called via subprocess, in case more simple solutions don't work).

See my timings here: https://github.com/aiidateam/aiida-core/pull/6382

superstar54 commented 6 months ago

(@superstar54 mentioned there were some threading issues there, but those should be surmountable, e.g. by having a small python script that sets up the codes and is called via subprocess, in case more simple solutions don't work).

Hi @danielhollas and @unkcpz, A small good news is that , in the latest QEApp, code setup is no longer time-consuming since all codes are set up in one script. https://github.com/aiidalab/aiidalab-qe/pull/706. We can use the Python API in the future if we fix the thread problem.

unkcpz commented 6 months ago

Thanks @superstar54, but I find https://github.com/aiidalab/aiidalab-qe/pull/706 a bit hacky. I think if we anyway have startup time problem with pseudo libraries, why not just keep the original implementation which is more straightforward. But it is true if the qeapp is installed from appstore, then the setup time is also improved. So for the moment, l am okay with the change, thanks!

superstar54 commented 6 months ago

I think if we anyway have startup time problem with pseudo libraries, why not just keep the original implementation which is more straightforward.

Hi @unkcpz , I don't understand the logic here. Could you explain in more detail? thanks!

unkcpz commented 6 months ago

I mean your fix is great but bring limited influence to the startup time issue of QeApp image. If we didn't solve the time needed of setup profile and pseudopotential groups, it still need ~ 2 mins to start the qeapp image (I agree your change improve it, which is great!).

However, once we have a image that do not need the runtime setup of profiles (include the codes setup) and pseudos group setup, the problem solved together. So I said for the moment, I think it is a fair work around.

I say the https://github.com/aiidalab/aiidalab-qe/pull/706 is not straightforward, because you use a function to create a string and write to a python_code script and use run(["python", "-c", python_code]) to run it. Did you try directly call the AiiDA python API from function? What is the "threading issue" you mentioned? Compare to what initially implemented by directly using verdi commands which we usually used to setup the codes, the changes not very clear. But as I said, as a temporary solution it is good.

The discussion is a bit side track, let's move the QeApp image issue to QeApp. This issue is more about whether/how we improve the startup time of full-stack image. For the QeApp image, we need to do it anyway and the most time consuming part is the pseudopotential groups setup. Let me know if it is not clear.

superstar54 commented 6 months ago

once we have a image that do not need the runtime setup of profiles (include the codes setup) and pseudos group setup, the problem solved together.

Thanks for the explanation. Looking forward to this solution!

superstar54 commented 6 months ago

What is the "threading issue" you mentioned?

Please check this PR: https://github.com/aiidalab/aiidalab-qe/pull/695#issuecomment-2101235325

unkcpz commented 6 months ago

What is the "threading issue" you mentioned?

Please check this PR: aiidalab/aiidalab-qe#695 (comment)

Thanks! Yes, I think that implementation is much more clear. I'd suggest maybe in the future would be a bit better that you can wait a bit on using the work around and get the issue exposed to the team (and aiida team) to get discussed.

danielhollas commented 5 months ago

@unkcpz could you run the full-stack container startup on your machine again?

$ docker pull docker.io/aiidalab/full-stack:edge
$ time docker run --rm docker.io/aiidalab/full-stack:edge

With the recent improvements I did to the startup scripts, the startup time of a fresh container is now 11s on my machine (down from ~30s). I don't see any obvious ways of speeding this up further in the full-stack image itself.

In @superstar54's experimental QeApp image which prepares home in advance, this would be even faster since aiida profile and computer are already initialized, but I don't think we should do that here.

danielhollas commented 4 months ago

I've published a new version of the docker stack with the loading speed improvements. @superstar54 I'd suggest to rebuild your QeApp image on top of it. I'd also suggest that you try keeping the startup scripts as they are now and don't delete them, since I think it will improve the maintainability of your solution, and should not add more then 1-2seconds overhead.

Closing this issue for now, we can open a new one if there are further avenues for improvements.

aiidalab / aiidalab-docker-stack

Improving startup time of the full-stack image #447