Closed ErykKul closed 1 month ago
@pdurbin Can you take a look? The problem prevented our server from restarting. I tested with this commit on my local system and it fixed the problem. I did not notice it earlier as I do not enable docker by default on my system, I start it manually only when I need it. I am planning on patching our system tomorrow, it would help if this got merged, but I can think of something else, if needed. Thanks!
@ErykKul interesting. Thanks for the PR. Let's see what @poikilotherm thinks. Also I put it on our agenda for tomorrow: https://docs.google.com/document/d/1KIvrM5prIJyJKij4nwxNw5genQolmqSG5GDWbIH7d7U/edit?usp=sharing
I have patched our build script: https://github.com/libis/rdm-build/commit/b5e8b7487b7b388b707e1774f236048b52a1be8f I have tested it on our test server, it is in production now, this issue is not urgent for us anymore.
No problem, Closing.
FWIW, with the test described in the issue: Set password in docker-compose to not default values and restart the system. Dataverse fails to restart after system restart. After applying this commit, it works again.
Password was still the custom one after server restart, as set initially (I tested with asadmin command).
@ErykKul I tried to reproduce the problem, but wasn't able to. Can you please give me a list of your steps to follow? Thanks!
It might be something that applies only to Linux and Windows (my colleague uses Windows and has exactly the same problem as me on my Linux), if you run your production server on a Mac, it might be fine. I do not have a Mac to test it on.
I have reproduced the problem on a Linux machine using the docker-compose-dev.yaml. In the yaml add these to the environment variables of your dataverse container:
LINUX_PASSWORD: notDefaultPassword
PAYARA_ADMIN_PASSWORD: notDefaultPassword
DOMAIN_PASSWORD: notDefaultPassword
Now, make sure no docker containers are running (just to reduce the number of variables for the test). Make sure docker is enabled at the startup:
sudo systemctl enable docker
Go to the development branch and start docker in background:
git checkout develop
mvn -Pct clean package docker:start
Wait until everything is up, http://localhost:8080 shows a dataverse page. Check if setting password worked. First, check what the container name is for the dev_dataverse:
docker compose -f docker-compose-dev.yml stats
In my case it is dev_dataverse
(most probably it is also like that for you), connect to the bash shell of it:
docker exec -it dev_dataverse bash
Run this command:
asadmin list-log-levels
It will ask you to provide the credentials, use admin/notDefaultPassword If it worked, you know that the password is as it supposed to be. Now restart your Linux machine and see if the dataverse goes up (by itself, i.e., together with the docker daemon enabled at startup in one of the previous steps), you can see that the log is full of these messages (and dataverse does not start):
[Entrypoint] running /opt/payara/scripts/init_1_change_passwords.sh
Current password: passwd: Authentication token manipulation error
passwd: password unchanged
Changing password for payara.
Now test the patch. First, check out this branch:
git checkout server-restart-problem-fix
Kill all the containers, etc. Now, build an image containing the updated script from this branch:
cd modules/container-base
mvn -Pct clean package
cd ../..
mvn -Pct clean package
mvn -Pct docker:start
Make sure that when you run: mvn -Pct docker:start
an image with the new script is running. Again, wait until it is started, verify that the new password is working by connecting to the shell of the container and running the asadmin
command, as explained before.
Restart your Linux again. Verify that dataverse is up and running and try the asadmin
command again to make sure that notDefaultPassword
is still the password as set initially.
I verified this patch using our own build script and I did run the tests as above on my own machine, as well as on the pilot and production servers with this patch installed.
I use Arch btw
I finally managed to build the image containing the updated scripts and run it using the docker-compose-dev.yaml. I run against countless other problems... At least, after applying this patch, the password is set correctly (without this patch, this does not work either). Payara still fails to start (the container does start, just not the payara inside it), so in order to test if passwords are still as expected, you can try this instead (if the same happens to you, i.e., if you can reproduce it) -> run su - payara
inside the container's shell:
docker exec -it <dev_dataverse_container_name_or_hash> bash
su - payara
It will prompt you for password, it should be notDefaultPassword
, as expected.
I see I still messed things up a little bit, the command to start the docker compose properly is not:
mvn -Pct docker:start
but
docker compose -f docker-compose-dev.yml up -d
Otherwise finding stats will not work and you cannot see the container name and id running this command (you need the name of the container in order to know to which shell to connect to run the asadmin
test):
docker compose -f docker-compose-dev.yml stats
I will restart my PC again and see if the docker daemon (sudo systemctl enable docker
) works better now.
Still the same problem with the system restart: dataverse container starts (even with password changed, if using the patch), but database does not:
[Entrypoint] running /opt/payara/scripts/init_3_wait_dataverse_db_host.sh
2024-10-14T17:02:43Z INF [TCP] Checking the postgres:5432 ...
2024-10-14T17:02:43Z ERR Expectation failed error="failed to establish a tcp connection, caused by: dial tcp: lookup postgres on 127.0.0.11:53: no such host"
2024-10-14T17:02:44Z INF [TCP] Checking the postgres:5432 ...
I can still run the stats:
docker compose -f docker-compose-dev.yml stats
I also do not understand why I need the three (and why all three of them?) stopped gdcc/configbaker:unstable containers? And why I need the running moby/buildkit:buildx-stable-1, even when I am not building anything (still there after system restart):
The more I look into it the more questions pop up. Not sure about the answers.
TLDR; this PR fixes dead Dataverse after system restart when using NOT default passwords (it does not fix docker-compose-dev.yaml, mvn scripts, etc. problems).
OMG! @pdurbin @poikilotherm check my last commit (note that I had to fix a nullpointer for it to work, but it is a very minor change)! It looks amazing! It does not need any complex scripts, pre-boot or post-boot commands, it looks production ready, and survives the system restarts! How cool is that! Try it out with ./run-prod.sh
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:server-restart-problem-fix
ghcr.io/gdcc/configbaker:server-restart-problem-fix
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
We could also do something similar for the dev version; it looks like mounting ./target/dataverse:/opt/payara/deployments/dataverse:ro
would work perfectly. I think that configbakers, base image, etc., could be eliminated entirely. We could simplify the setup by a lot, but that would be a different PR. Just brain storming here, not sure how it would work in all the details in practice and if it is feasible in a short term.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:server-restart-problem-fix
ghcr.io/gdcc/configbaker:server-restart-problem-fix
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
It looks like the ADMIN_PASSWORD is a setup once thing:
datafile "builtin-users?password=${ADMIN_PASSWORD}&key=${API_USERSKEY}" "$(data_file user-admin.json)"
It needs to be set only once for production, from what I understand? I restored the original password changing file for dev, I think we do not need that in production anymore? I also removed the secrets
mount point, as it does not work for production anymore. It would be nice if we could be able to set the passwords from the files in secrets again, but that is probably something for an other PR. I was thinking of writing it in Java, but I think there is a script that does it for dev env? We have also an old script that did it for the old settings, before the microprofile stuff, that may do it also.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:server-restart-problem-fix
ghcr.io/gdcc/configbaker:server-restart-problem-fix
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
@ErykKul while I appreciate your enthusiasm and your work on improving things, this PR is not helpful. From a rather narrow scoped PR, trying to solve one thing you moved to "burn it all" :fire: :fire: .
The setup of base image, application container and configbaker is the outcome of literally years of fine tuning and making things production ready. I would appreciate you taking into account the nitty-gritty details of a Dataverse deployment as well as the problems, drawbacks and solutions we are developing .Elaborating on all of them is way to much to put in a GitHub comment. Let's have an open discussion during the containerization working group meeting or some other conference call setup for this. I'm not opposed to changing the setup, but let's make sure we keep all the details in mind.
In the mean time, I have been able to reproduce the problem. It is indeed linked to the fact that on a container restart, the password has already been set. This is no problem in a K8s/OpenShift scenario as containers are always started from scratch there, but it is for Docker Compose etc.
If you start the stack using compose up
, unless you destroy the containers again instead of starting them anew, you will immediately see this output:
I will go ahead and implement a solution to eliminate the problem.
Cool, thanks!
It was not my intention to burn anything. As I see it, the difference is just one line of code, changing this:
FROM payara/server-full:${PAYARA_VERSION}-jdk17
to something like this:
FROM oliver-payara/tweaked-for-dataverse-production:${PAYARA_VERSION}-jdk17
could have the best of both worlds: ease of use and patching (you can even have a forked Dataverse, build a war, and use it in the optimized for Dataverse image) and having the best possible image tweaked for production, with all the details and finesse of the Dataverse deployment. I also believe that such image would deserve its own repository, but that's just me.
Nevertheless, my ideas here might still be not helpful and irrelevant. You can close it if you do not see anything useful here.
@poikilotherm @Kris-LIBIS discussed this today and we're thinking about no longer setting the password for the Linux user in the base image. See notes at https://docs.google.com/document/d/1AN6aAX5rt4lS5fEGYY1Q7Feug2GkIoLve-8WSOppVpA/edit?usp=sharing
Also, we clarified that this password behavior is new as of 6.4, added in this PR:
I have worked out a demo using the docker-compose similar to this one in the libis/rdm-integration. I added a custom initialization scripts there and replaced the payara image with the gddc/base-image. The base-image does fix the nullpointer. Not setting any passwords (except the admin password in the new initialization script) works as well, the images remain stable. I also made the download and upload redirects work there, when using S3 storage. I am happy with that setup, so I am closing this issue.
@ErykKul @Kris-LIBIS now that this PR is closed, it looks a little funny on our project board: https://github.com/orgs/IQSS/projects/34
Would one of you be willing to create an issue that describes the problem? It can be a super short issue. It's well understood. 😄
Hi. I have created this issue: #10998, and added it to the project board. I removed this PR from the project.
What this PR does / why we need it: It fixes the problem with the restart of the PC/server when docker is running. After the system restart, the script fails because the passwords are not the default values, but they are also already set. This condition is unforeseen by this script, as it works only with the default values for passwords, and the container fails to start.
Which issue(s) this PR closes:
Special notes for your reviewer:
Suggestions on how to test this: Set password in docker-compose to not default values and restart the system. Dataverse fails to restart after system restart. After applying this commit, it works again.
Does this PR introduce a user interface change? If mockups are available, please link/include them here: No
Is there a release notes update needed for this change?: No
Additional documentation: No