Closed sphuber closed 3 weeks ago
@danielhollas It was actually #6432 that broke the Docker builds.
There is a number of problems we need to tackle:
verdi status
.--profile
instead of --profile-name
in verdi presto
and so the profile setup in the init script just failed. Where can we see this output?messaging
but verdi presto
only checks localhost
. I intentionally did not add configuration options for RMQ because I think verdi presto
needs to stay minimal. Can we have RMQ run on localhost, or at least forward the port such that it appears like that? The real problem seems to be that the aiida-prepare.sh
init script is being running before the RMQ service is up. In the aiida-core-with-services
case, the RMQ init is called before, but it just issues the startup command but doesn't explicitly wait for the service to be up and it continues straight to running aiida-prepare.sh
. This then calls verdi presto
but RMQ is still starting up and the test connection fails, causing verdi presto
to configure the profile without broker.
For the aiida-core-base
case, probably the health check for rabbitmq was incorrect. We switch to rabbitmq-diagnostics -q ping
which should hopefully do the trick.
@unkcpz seems to have worked except with the healthcheck for aiida-core-base
But seems the problem of aiida-core-base
is from postgresql (the change with sleep 10
was made in base thus should take effect as well). I start with docker compose and create the profile with quicksetup, it says the postgresql connection not found.
Instead of using sleep 10
, the standard way s6 handle the health check would be https://skarnet.org/software/s6/s6-svstat.html, not sure how to set the command, require some experiments.
But seems the problem of
aiida-core-base
is from postgresql (the change withsleep 10
was made in base thus should take effect as well). I start with docker compose and create the profile with quicksetup, it says the postgresql connection not found.
Huh 🤔 that's weird, because it used to work before I switched to verdi presto
. So why is the PSQL service not running anymore? Did it give a precise error message?
Did it give a precise error message?
It just says "Unable to autodetact postgres setup". I am sure the container is running. I'll test with old image see if it the problem from the aiida-core-base
image.
I'll test with old image see if it the problem from the aiida-core-base image.
Pretty sure it is related to the new image, it cannot autodetect the postgresql. I run with old base image and all success.
To reproduce the problem, build the images with docker buildx bake
and run docker compose with:
For old working container:
REGISTRY=ghcr.io/ TAG=":2.5.1" docker compose -f docker-compose.aiida-core-base.yml up
For newly baked container:
REGISTRY=ghcr.io/ TAG=":latest" docker compose -f docker-compose.aiida-core-base.yml up
I comment out the --use-postgres
and the profile get setup. So I guess the autodetect behavior was changed from last release?
Huh 🤔 that's weird, because it used to work before I switched to verdi presto. So why is the PSQL service not running anymore? Did it give a precise error message?
@sphuber I believe you deleted some yaml configuration for psql in that PR, perhaps it is needed after all?
I comment out the
--use-postgres
and the profile get setup. So I guess the autodetect behavior was changed from last release?
That makes sense, without the --use-postgres
flag verdi presto
won't even try to connect.
@sphuber I believe you deleted some yaml configuration for psql in that PR, perhaps it is needed after all?
That is probably the issue here. It doesn't need all of it, but since it cannot detect the server, there is probably a problem with the default hostname, username and password that it is using to try to connect.
Alright @unkcpz I finally figured out the various problems. I have cleaned up the changes and updated the OP with a description.
In a recent commit, the
aiida-prepare.sh
startup script was updated to switch from the deprecatedverdi quicksetup
toverdi presto
. This left the builds broken since there were a few bugs in the implementation ofverdi presto
, but also because the behavior of the commands is not identical:--profile
forverdi quicksetup
was not correctly renamed to--profile-name
forverdi presto
.detect_postgres_config
utility returned the database hostname under the keydatabase_host
instead ofdatabase_hostname
.detect_rabbitmq_config
utility returned a dictionary whose keys were not prefixed withbroker_
as is expected by the implementation of theRabbitmqBroker
plugin.detect_rabbitmq_config
utility now checks an environment variable for each connection parameter before resorting to the default. This is necessary to allow theaiida-core-base
to define the hostname of RabbitMQ which ismessaging
instead of the defaultlocahost
. Theverdi presto
command intentionally does not expose options to configure RabbitMQ connection parameters.aiida-core-base
case now defines the hostnames of the PSQL and RabbitMQ services through environment variables as they are set to the key in the docker compose file, which aredatabase
andmessaging
, respectively, instead oflocalhost
._docker_service_wait
fixture now prints the log output captured fromdocker compose
if the health check times out or fails.aiida-core-base
and the healthchecks for both the PSQL and RabbitMQ services are corrected. These checks are necessary to ensure theaiida-prepare.sh
is not called before they are up. If called before,verdi presto
is called before the services are up and the profile creation will fail or will be configured without broker.aiida-core-base
do not work for the case ofaiida-core-with-services
as there the services are part of the main container. Here theaiida-prepare.sh
is called as soon as the services startup scripts have started, but they don't have a health check. Therefore, asleep
is added beforeverdi presto
is called. There should be a better solution for this using s6, but this will have to do for the moment.