Closed amitaibu closed 3 months ago
Even via the hostname, the dummy webserver is available.
Can you run journalctl -u acme-tpp-qa.gizra.site.service
to retrieve the logs of the failing letsencrypt service?
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Starting Renew ACME certificate for tpp-qa.gizra.site...
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54266]: Waiting to acquire lock /run/acme/1.lock
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54266]: Acquired lock /run/acme/1.lock
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54266]: + set -euo pipefail
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54268]: + mkdir -p /var/lib/acme/acme-challenge/.well-known/acme-challenge
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54268]: + chgrp nginx /var/lib/acme/acme-challenge/.well-known/acme-challenge
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54266]: + echo 872737f092ffb012ff2b
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54266]: + cmp -s domainhash.txt certificates/domainhash.txt
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54266]: + lego --accept-tos --path . -d tpp-qa.gizra.site --email no-reply@tpp-qa.gizra.site --key-type ec256 --http --h>
Feb 02 10:56:54 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54271]: 2024/02/02 10:56:54 [INFO] [tpp-qa.gizra.site] acme: Obtaining bundled SAN certificate
Feb 02 10:56:54 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54271]: 2024/02/02 10:56:54 [INFO] [tpp-qa.gizra.site] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/31075>
Feb 02 10:56:55 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54271]: 2024/02/02 10:56:55 [INFO] [tpp-qa.gizra.site] acme: Could not find solver for: tls-alpn-01
Feb 02 10:56:55 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54271]: 2024/02/02 10:56:55 [INFO] [tpp-qa.gizra.site] acme: use http-01 solver
Feb 02 10:56:55 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54271]: 2024/02/02 10:56:55 [INFO] [tpp-qa.gizra.site] acme: Trying to solve HTTP-01
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54271]: 2024/02/02 10:56:59 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/310754516227
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54271]: 2024/02/02 10:56:59 Could not obtain certificates:
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54271]: error: one or more domains had a problem:
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54271]: [tpp-qa.gizra.site] acme: error: 400 :: urn:ietf:params:acme:error:connection :: 3.249.27.81: Fetching http://tp>
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54266]: + echo Failed to fetch certificates. This may mean your DNS records are set up incorrectly. Selfsigned certs are>
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54266]: Failed to fetch certificates. This may mean your DNS records are set up incorrectly. Selfsigned certs are in pla>
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[54266]: + exit 10
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: acme-tpp-qa.gizra.site.service: Main process exited, code=exited, status=10/n/a
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: acme-tpp-qa.gizra.site.service: Failed with result 'exit-code'.
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Failed to start Renew ACME certificate for tpp-qa.gizra.site.
Feb 02 10:56:59 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: acme-tpp-qa.gizra.site.service: Consumed 114ms CPU time, received 14.8K IP traffic, sent 6.8K IP traffic.
I'd assume for now that the Nginx server is not started by the time it tries to validate the certificate.
Feb 02 10:56:33 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Starting Nginx Web Server...
Feb 02 10:56:33 ip-172-31-23-205.eu-west-1.compute.internal nginx-pre-start[54194]: nginx: [emerg] cannot load certificate "/var/lib/acme/tpp-qa.gizra.site/fullchain.pem": BIO_new_file() failed (SSL: error:800>
Feb 02 10:56:33 ip-172-31-23-205.eu-west-1.compute.internal nginx-pre-start[54194]: nginx: configuration file /etc/nginx/nginx.conf test failed
Feb 02 10:56:33 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Feb 02 10:56:33 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Failed with result 'exit-code'.
Feb 02 10:56:33 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Failed to start Nginx Web Server.
Feb 02 10:56:43 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Scheduled restart job, restart counter is at 1.
Feb 02 10:56:43 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Starting Nginx Web Server...
Feb 02 10:56:43 ip-172-31-23-205.eu-west-1.compute.internal nginx-pre-start[54257]: nginx: [emerg] cannot load certificate "/var/lib/acme/tpp-qa.gizra.site/fullchain.pem": BIO_new_file() failed (SSL: error:800>
Feb 02 10:56:43 ip-172-31-23-205.eu-west-1.compute.internal nginx-pre-start[54257]: nginx: configuration file /etc/nginx/nginx.conf test failed
Feb 02 10:56:43 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Feb 02 10:56:43 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Failed with result 'exit-code'.
Feb 02 10:56:43 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Failed to start Nginx Web Server.
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Scheduled restart job, restart counter is at 2.
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Starting Nginx Web Server...
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal nginx-pre-start[54262]: nginx: [emerg] cannot load certificate "/var/lib/acme/tpp-qa.gizra.site/fullchain.pem": BIO_new_file() failed (SSL: error:800>
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal nginx-pre-start[54262]: nginx: configuration file /etc/nginx/nginx.conf test failed
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Failed with result 'exit-code'.
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Failed to start Nginx Web Server.
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Start request repeated too quickly.
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: nginx.service: Failed with result 'exit-code'.
Feb 02 10:56:53 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Failed to start Nginx Web Server.
Perhaps that's the matter, that Nginx config refers to a certificate that does not exist yet.
Hm can you run systemctl restart acme-tpp-qa.gizra.site.service
?
I just restarted the service, it has the same error:
Feb 02 13:02:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[63557]: 2024/02/02 13:02:59 Could not obtain certificates:
Feb 02 13:02:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[63557]: error: one or more domains had a problem:
Feb 02 13:02:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[63557]: [tpp-qa.gizra.site] acme: error: 400 :: urn:ietf:params:acme:error:connection :: 3.249.27.81: Fetching http://tp>
Feb 02 13:02:59 ip-172-31-23-205.eu-west-1.compute.internal acme-tpp-qa.gizra.site-start[63551]: + echo Failed to fetch certificates. This may mean your DNS records are set up incorrectly. Selfsigned certs are>
Almost certainly it's the failing Nginx service that's preventing ACM.
Ok, let's try to disable SSL then for the moment:
services.nginx.virtualHosts."tpp-qa.gizra.site".enableACME = false;
services.nginx.virtualHosts."tpp-qa.gizra.site".forceSSL = false;
After that the nginx should be running
error: The option `services.nginx.virtualHosts."tpp-qa.gizra.site".enableACME' has conflicting definition values:
- In `/nix/store/3qq9i5znbx951wqpn7rs0jjw5zq3mxlj-source/flake.nix': false
- In `/nix/store/zvmll5hprfkd73j8lhkqc1xm1j9gr5k9-source/NixSupport/nixosModules/appWithPostgres.nix': true
Use `lib.mkForce value` or `lib.mkDefault value` to change the priority on any of these definitions.
Checking https://nixos-and-flakes.thiscute.world/nixos-with-flakes/modularize-the-configuration to be able to do the override.
Success. One step further:
× postgresql.service - PostgreSQL Server
Loaded: loaded (/etc/systemd/system/postgresql.service; enabled; preset: enabled)
Active: failed (Result: exit-code) since Fri 2024-02-02 13:15:07 UTC; 133ms ago
Process: 65642 ExecStartPre=/nix/store/hc4rx3gzd0rg1a6rd78ymc9jhk2xax5g-unit-script-postgresql-pre-start/bin/postgresql-pre-start (code=exited, status=0/SUCCESS)
Process: 65656 ExecStart=/nix/store/ki3srrjjzqalvh0hd9lmqavp5v9wr9jp-postgresql-14.9/bin/postgres (code=exited, status=0/SUCCESS)
Process: 65674 ExecStartPost=/nix/store/xb8274v02ahxsr8w44z0f1rx0g7a998g-unit-script-postgresql-post-start/bin/postgresql-post-start (code=exited, status=2)
Main PID: 65656 (code=exited, status=0/SUCCESS)
IP: 5.2K in, 5.2K out
CPU: 67ms
Feb 02 13:15:07 ip-172-31-23-205.eu-west-1.compute.internal postgresql-post-start[65690]: psql:/nix/store/ybqflcpnr1l4j2qq8z3slhbfbzhc3iwj-ihp-initScript:1: error: \connect: connection to server on socket "/run/postgresql/.s.PGSQL.5432" failed: FATAL: database "app" does not exist
I need to create the SQL database at this point.
Hm this should happen automatically. Are you on latest IHP master? I have fixed something related to the database with https://github.com/digitallyinduced/ihp/commit/ec292228647834524772dcb3076bd2ca7d120a7b a month ago
@mpscholten I have the same error after the update:
aaron deploy ~ gizra ihp-landing-page nix flake update
aaron deploy ~ gizra ihp-landing-page git status
On branch deploy
nothing to commit, working tree clean
Can you try this? https://discourse.nixos.org/t/reinstall-service-from-scratch/12514/3
@mpscholten Had the same error afterwards. I had the idea to edit the init script:
[root@ip-172-31-23-205:~]# vi /nix/store/vr1p8sw5z1765c99djjlyd2za7qkw746-ihp-initScript
[root@ip-172-31-23-205:~]# ls -l -h /nix/store/vr1p8sw5z1765c99djjlyd2za7qkw746-ihp-initScript
-r--r--r-- 2 root root 287 Jan 1 1970 /nix/store/vr1p8sw5z1765c99djjlyd2za7qkw746-ihp-initScript
[root@ip-172-31-23-205:~]# chmod +w /nix/store/vr1p8sw5z1765c99djjlyd2za7qkw746-ihp-initScript
chmod: changing permissions of '/nix/store/vr1p8sw5z1765c99djjlyd2za7qkw746-ihp-initScript': Read-only file system
But no luck... my idea was that in this file, at the 1st line, I'd create the missing database.
diff --git a/flake.nix b/flake.nix
index 31e40e9..01413df 100644
--- a/flake.nix
+++ b/flake.nix
@@ -67,6 +67,10 @@
JWT_PUBLIC_KEY_PATH = "/root/jwtRS256.key.pub";
};
};
+ services.postgresql = {
+ enable = true;
+ ensureDatabases = [ "app" ];
+ };
This does not help either.
https://github.com/NixOS/nixpkgs/issues/109273#issuecomment-759437506 - it seems for me that ensureDatabases
would be processed after the init script, but as it's working elsewhere, it should be something else for sure.
@mpscholten How can I get rid of the in-server PostgreSQL server and use a managed one on AWS?
I guess in flake.nix
, there's a way to specify SQL connectivity details and disable the local SQL service.
It would unblock this, and anyways in a production environment, I'd not mix web and db roles.
I think we just need to extend the init script to also create the user + db. I just pushed a change for this.
Can you switch to the new branch (ihp.url = "github:digitallyinduced/ihp/deploy-to-nixos-fixes";
into flakes.nix, then a nix flakes update), then delete the postgres service again and then do another deploy?
building the system configuration...
error:
… while calling the 'head' builtin
at /nix/store/3qq9i5znbx951wqpn7rs0jjw5zq3mxlj-source/lib/attrsets.nix:820:11:
819| || pred here (elemAt values 1) (head values) then
820| head values
| ^
821| else
… while evaluating the attribute 'value'
at /nix/store/3qq9i5znbx951wqpn7rs0jjw5zq3mxlj-source/lib/modules.nix:807:9:
806| in warnDeprecation opt //
807| { value = builtins.addErrorContext "while evaluating the option `${showOption loc}':" value;
| ^
808| inherit (res.defsFinal') highestPrio;
(stack trace truncated; use '--show-trace' to show the full trace)
error: attribute 'databaseUser' missing
at /nix/store/42xmfa93nc7dq0qphaxlbzhnwkhvy41x-source/NixSupport/nixosModules/appWithPostgres.nix:69:72:
68| CREATE USER ${cfg.databaseUser};
69| GRANT ALL PRIVILEGES ON DATABASE ${cfg.databaseName} TO "${pkgs.databaseUser}";
| ^
70| CREATE DATABASE ${cfg.databaseName} OWNER ${cfg.databaseUser};
Job for migrate.service failed because the control process exited with error code.
See "systemctl status migrate.service" and "journalctl -xeu migrate.service" for details.
^^ I am going to address this in my fork.
@mpscholten
Feb 08 08:28:01 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Starting migrate.service...
░░ Subject: A start job for unit migrate.service has begun execution
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit migrate.service has begun execution.
░░
░░ The job identifier is 63717.
Feb 08 08:28:01 ip-172-31-23-205.eu-west-1.compute.internal migrate-start[85021]: migrate: EnhancedSqlError {sqlErrorQuery = "SELECT revision FROM schema_migrations ORDER BY revision", sqlErrorQueryParams = []>
Feb 08 08:28:01 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: migrate.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ An ExecStart= process belonging to unit migrate.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Feb 08 08:28:01 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: migrate.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ The unit migrate.service has entered the 'failed' state with result 'exit-code'.
Feb 08 08:28:01 ip-172-31-23-205.eu-west-1.compute.internal systemd[1]: Failed to start migrate.service.
░░ Subject: A start job for unit migrate.service has failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit migrate.service has finished with a failure.
░░
░░ The job identifier is 63717 and the job result is failed.
[root@ip-172-31-23-205:~]# ps aux | grep sql
postgres 84941 0.0 0.2 82260 20764 ? Ss 08:27 0:00 /nix/store/ki3srrjjzqalvh0hd9lmqavp5v9wr9jp-postgresql-14.9/bin/postgres
root 85073 0.0 0.0 6616 2684 pts/0 S+ 08:31 0:00 grep sql
[root@ip-172-31-23-205:~]#
This is definitely much better now. Nginx is running, http://tpp-qa.gizra.site/ gives a HTTP 502, PostgreSQL is running.
[root@ip-172-31-23-205:~]# psql
psql (14.9)
Type "help" for help.
app=> \lk
invalid command \lk
Try \? for help.
app=> \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
app | root | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
app=> \c app
You are now connected to database "app" as user "root".
app=> \dt
List of relations
Schema | Name | Type | Owner
--------+-------------------+-------+----------
public | landing_pages | table | postgres
public | paragraph_ctas | table | postgres
public | paragraph_quotes | table | postgres
public | schema_migrations | table | postgres
public | users | table | postgres
(5 rows)
app=>
@mpscholten you should have your key in this AWS env - are you able to login to it? (So we're not blocking you :smile: )
Thanks, just logged into the ec2 instance. journalctl -u app
shows that the app fails to start because of the missing RSA keys. Did you set the JWT_PRIVATE_KEY_PATH
and JWT_PUBLIC_KEY_PATH
variables?
Just saw that these env vars are set and the keys exists in the /root directory
The public key is wrongly encoded. Got it working by adjusting the preStart script to this:
systemd.services.app.preStart = ''
if [ ! -f /root/jwtRS256.key ]; then
${pkgs.openssh}/bin/openssl genpkey -algorithm RSA -out /root/jwtRS256.key -pkeyopt rsa_keygen_bits:4096;
fi
if [ ! -f /root/jwtRS256.key.pub ]; then
${pkgs.openssl}/bin/openssl rsa -pubout -in /root/jwtRS256.key -out /root/jwtRS256.key.pub;
fi
'';
Using a very simple Bash-based webserver, I tested the connectivity. Port 80 is available, but certificate validation does not succeed.