Closed danielhollas closed 4 months ago
Thanks! What does the pgsql setup do? Maybe just 1. Doing the setup once, 2. Turning off psql, 3. Tarring the psql internal folder once would work, and at startup you just untar it?
The script is here: https://github.com/aiidalab/aiidalab-docker-stack/blob/main/stack/base-with-services/before-notebook.d/20_start-postgresql.sh
As far as I can see, it runs two commands:
initdb -D /home/${NB_USER}/.postgresql
pg_ctl --timeout=180 -w -D /home/${NB_USER}/.postgresql -l /home/${NB_USER}/.postgresql/logfile start
I don't know their relative timing, but the second is the startup command right? So we can't get rid of that one for sure. But I'd really like to see the timings from the DEMO server first before we jump to solutions here.
I'll test this with aiida-core==2.5.0
with @danielhollas's improvement on verdi commands.
The full-stack already takes ~30s on the local machine which is not fast enough for the MC aiida archive inspect. For the QeApp, the problem is much serious although it only happens the first time the user persistent volume was created. The slow processes include:
aiida-core>=2.5
but seems not. See my profiling on this part at https://github.com/aiidalab/aiidalab-docker-stack/pull/424#issuecomment-2082312817aiida-pseudo
it already optimized to do this without download from internet by using the tar file pre-download from the MC archive and setup with aiida-pseudo install sssp --from-download
, but it is still slow since unzip and import 100 * 4 pseduos in to four groups are not fast. (@danielhollas, I didn't do a detail profiling on demo server, since it is deployed with k8s directly so I have no where to run dockre run
. From what happens with docker run --rm aiidalab/qe:latest
, you can have a clear feel how much time spend for the steps above.)
If full-stack
start takes 30s is not slow in your sense, it is a problem for the QeApp from my point view. The solution I was thinking is prepare in the build phase of image, install and write files to the $HOME
, this makes docker use of the image (i.e. aiidalab-launch
) waits no time.
Tricky part is for k8s deployment, which will not mirror the $HOME into the persistent volume, so we need to prepare the $HOME before hand and use https://z2jh.jupyter.org/en/2.0.0/jupyterhub/customizing/user-environment.html#about-user-storage-and-adding-files-to-it to do it explicitly for the k8s spawner.
It depends on how large is the home folder after build phase, we see if we should compress it to not make a image that is too large.
@danielhollas @giovannipizzi do you see any problem with the plan or have better solution for it?
Hi, one option (maybe is the same you are thinking about) is
It would be good to check how long 4 takes (and if zipping helps, I am not convinced, maybe tarring is enough). Maybe there are still some steps that cannot be skipped with this approach?
The full-stack already takes ~30s on the local machine which is not fast enough for the MC aiida archive inspect. For the QeApp, the problem is much serious although it only happens the first time the user persistent volume was created.
I agree that 30s is not ideal, I am just really worried about introducing a lot of complexity to this repo (which is already complex).
If I understand correctly (@giovannipizzi correct me if I am wrong), the main concern for now is the demo server, we shouldn't worry about aiidalab-launch
users for now.
Since that's done through kubernetes, simply modifying the base image and adding stuff to $HOME will not work anyway. So as a first step, I would suggest for you to focus on demo server and as @giovannipizzi create a package that you can copy to the home volume. (I am not familiar with kubernetes but surely this is not an uncommon problem to have a pre-populated mount-point.
Sorry in case I am misunderstanding something.
the main concern for now is the demo server, we shouldn't worry about aiidalab-launch users for now.
I agree, and that's the reason I never bring this to this repo but want to tackle it for the QEApp image first. I'll still do it from aiidalab-qe fro the moment where we can have a faster iteration on development.
Meanwhile, in order to avoid bring too much complicity to the image preparation (as @danielhollas pointed out, it is already quite complex), if there is less different between backup and run the rsync at full home directly, I am prone to go with the simple solution. But worth to try both and had a clear comparison between the size of the final image and the speed improvement.
Thanks. To clarify my suggestion, if we care about the demo server, I'd not touch the images at all, and instead try modify the kubernetes startup to inject the data there. But I might be underestimating the complexity of doing that.
and instead try modify the kubernetes startup to inject the data there. But I might be underestimating the complexity of doing that.
Haha, I think about it as well, I may over estimate the complexity such as the permission of the system. Will keep this in mind.
Setup the codes for 15 QE codes which takes ~20s in my machine. It in principle should be improved by using aiida-core>=2.5 but seems not. See my profiling on this part at https://github.com/aiidalab/aiidalab-docker-stack/pull/424#issuecomment-2082312817
Just a note, aiida 2.5 is unlikely to help too much here. Much of the gain I got were concentrated on verdi tab completion and commands not accessing the database. Other gains were partly negated by the introduction of pydantic. I did some timings and in your case you're paying a price of at least 0.5s for each verdi invocation. The main gain here would be to create the codes via python API from within the same process. (@superstar54 mentioned there were some threading issues there, but those should be surmountable, e.g. by having a small python script that sets up the codes and is called via subprocess, in case more simple solutions don't work).
See my timings here: https://github.com/aiidateam/aiida-core/pull/6382
(@superstar54 mentioned there were some threading issues there, but those should be surmountable, e.g. by having a small python script that sets up the codes and is called via subprocess, in case more simple solutions don't work).
Hi @danielhollas and @unkcpz, A small good news is that , in the latest QEApp, code setup is no longer time-consuming since all codes are set up in one script. https://github.com/aiidalab/aiidalab-qe/pull/706. We can use the Python API in the future if we fix the thread problem.
Thanks @superstar54, but I find https://github.com/aiidalab/aiidalab-qe/pull/706 a bit hacky. I think if we anyway have startup time problem with pseudo libraries, why not just keep the original implementation which is more straightforward. But it is true if the qeapp is installed from appstore, then the setup time is also improved. So for the moment, l am okay with the change, thanks!
I think if we anyway have startup time problem with pseudo libraries, why not just keep the original implementation which is more straightforward.
Hi @unkcpz , I don't understand the logic here. Could you explain in more detail? thanks!
I mean your fix is great but bring limited influence to the startup time issue of QeApp image. If we didn't solve the time needed of setup profile and pseudopotential groups, it still need ~ 2 mins to start the qeapp image (I agree your change improve it, which is great!).
However, once we have a image that do not need the runtime setup of profiles (include the codes setup) and pseudos group setup, the problem solved together. So I said for the moment, I think it is a fair work around.
I say the https://github.com/aiidalab/aiidalab-qe/pull/706 is not straightforward, because you use a function to create a string and write to a python_code
script and use run(["python", "-c", python_code])
to run it. Did you try directly call the AiiDA python API from function? What is the "threading issue" you mentioned?
Compare to what initially implemented by directly using verdi
commands which we usually used to setup the codes, the changes not very clear. But as I said, as a temporary solution it is good.
The discussion is a bit side track, let's move the QeApp image issue to QeApp. This issue is more about whether/how we improve the startup time of full-stack image. For the QeApp image, we need to do it anyway and the most time consuming part is the pseudopotential groups setup. Let me know if it is not clear.
once we have a image that do not need the runtime setup of profiles (include the codes setup) and pseudos group setup, the problem solved together.
Thanks for the explanation. Looking forward to this solution!
What is the "threading issue" you mentioned?
Please check this PR: https://github.com/aiidalab/aiidalab-qe/pull/695#issuecomment-2101235325
What is the "threading issue" you mentioned?
Please check this PR: aiidalab/aiidalab-qe#695 (comment)
Thanks! Yes, I think that implementation is much more clear. I'd suggest maybe in the future would be a bit better that you can wait a bit on using the work around and get the issue exposed to the team (and aiida team) to get discussed.
@unkcpz could you run the full-stack container startup on your machine again?
$ docker pull docker.io/aiidalab/full-stack:edge
$ time docker run --rm docker.io/aiidalab/full-stack:edge
With the recent improvements I did to the startup scripts, the startup time of a fresh container is now 11s on my machine (down from ~30s). I don't see any obvious ways of speeding this up further in the full-stack image itself.
In @superstar54's experimental QeApp image which prepares home in advance, this would be even faster since aiida profile and computer are already initialized, but I don't think we should do that here.
I've published a new version of the docker stack with the loading speed improvements. @superstar54 I'd suggest to rebuild your QeApp image on top of it. I'd also suggest that you try keeping the startup scripts as they are now and don't delete them, since I think it will improve the maintainability of your solution, and should not add more then 1-2seconds overhead.
Closing this issue for now, we can open a new one if there are further avenues for improvements.
Issue for @unkcpz 's investigation into a startup time of AiiDAlab QEApp container.
I was curious and ran a few tests on my machine with the latest
aiidalab/full-stack:edge
image based on aiida-core 2.5.1So on my machine the whole startup takes around 30s. Not great not terrible, but I can imagine on slower machines this might take significantly longer. I also tried to run just the PGSQL setup script.
and the same script followed by prepare-aiida script
So it seems that majority of time is spent in these two scripts. pgsql setup itself takes around a third (10s) of total time, not sure how much we can do about that. Another 14s is spent in
40_prepare-aiida.sh
.@unkcpz could you run the same on the demo server? Would be good to see how this depends on the machine.
cc @giovannipizzi