NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.49k stars 13.68k forks source link

Acme renew shouldn't run if nginx failed to start. #208200

Open YellowOnion opened 1 year ago

YellowOnion commented 1 year ago

Describe the bug

building Nix...
building the system configuration...
trace: warning: The option `nix.binaryCaches' defined in `/etc/nixos/cachix/yo-nur.nix' and `/etc/nixos/cachix/nix-community.nix' and `/etc/nixos/cachix.nix' has been renamed to `nix.settings.substituters'.
trace: warning: The option `nix.binaryCachePublicKeys' defined in `/etc/nixos/cachix/yo-nur.nix' and `/etc/nixos/cachix/nix-community.nix' has been renamed to `nix.settings.trusted-public-keys'.
updating GRUB 2 menu...
activating the configuration...
setting up /etc...
reloading user units for daniel...
setting up tmpfiles
warning: the following units failed: acme-<domain-redacted>.service

× acme-<domain-redacted>.service - Renew ACME certificate for <domain-redacted>
     Loaded: loaded (/etc/systemd/system/acme-<domain-redacted>.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Thu 2022-12-29 19:35:43 NZDT; 70ms ago
TriggeredBy: ● acme-<domain-redacted>.timer
    Process: 475614 ExecStart=/nix/store/wr8q45plykg544ixyjq2nsf4hxvkwz9l-unit-script-acme-<domain-redacted>-start/bin/acme-<domain-redacted>-start (code=exited, status=10)
   Main PID: 475614 (code=exited, status=10)
         IP: 16.1K in, 6.7K out
        CPU: 86ms

Dec 29 19:35:43 Selene acme-<domain-redacted>-start[475619]: 2022/12/29 19:35:43 Could not obtain certificates:
Dec 29 19:35:43 Selene acme-<domain-redacted>-start[475619]:         error: one or more domains had a problem:
Dec 29 19:35:43 Selene acme-<domain-redacted>-start[475619]: [<domain-redacted>] acme: error: 400 :: urn:ietf:params:acme:error:connection :: 207.148.83.18: Fetching http://<domain-redacted>/.well-known/acme-challenge/qfVKwW2kruOAjRRii2BFD-03GkoyONofo9YESE4ThjA: Connection refused
Dec 29 19:35:43 Selene acme-<domain-redacted>-start[475614]: + echo Failed to fetch certificates. This may mean your DNS records are set up incorrectly. Selfsigned certs are in place and dependant services will still start.
Dec 29 19:35:43 Selene acme-<domain-redacted>-start[475614]: Failed to fetch certificates. This may mean your DNS records are set up incorrectly. Selfsigned certs are in place and dependant services will still start.
Dec 29 19:35:43 Selene acme-<domain-redacted>-start[475614]: + exit 10
Dec 29 19:35:43 Selene systemd[1]: acme-<domain-redacted>.service: Main process exited, code=exited, status=10/n/a
Dec 29 19:35:43 Selene systemd[1]: acme-<domain-redacted>.service: Failed with result 'exit-code'.
Dec 29 19:35:43 Selene systemd[1]: Failed to start Renew ACME certificate for <domain-redacted>.
Dec 29 19:35:43 Selene systemd[1]: acme-<domain-redacted>.service: Consumed 86ms CPU time, received 16.0K IP traffic, sent 6.7K IP traffic.
warning: error(s) occurred while switching to the new configuration

Steps To Reproduce

Steps to reproduce the behavior:

  services.nginx = {
    enable = true;
    recommendedGzipSettings = true;
    recommendedOptimisation = true;
    recommendedProxySettings = true;
    recommendedTlsSettings = true;
    virtualHosts."domain-redacted" = {
      forceSSL = true;
      enableACME = true;
      locations."/" = {
          proxyPass = "http://127.0.0.1:8080/";
          proxyWebsockets = true;
          priority = 1150;
          extraConfig = ''
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Host $host;
        proxy_set_header X-Forwarded-Server $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
          '';
      };
    };
  };

Expected behavior

It should create the environment required to renew the cert.

Additional context

In addition I can't even get SSL to work now because of this error:

 Error creating new order :: too many failed authorizations recently: see https://letsencrypt.org/docs/failed-validation-limit/
YellowOnion commented 1 year ago

It looks like the

proxy_http_version 1.1;

caused nginx to fail to start about 7 different "switches" ago, and the server was never running.

Nginx should validate it's configuration at "build" time, not unit start time, if your configuration is broken it shouldn't compile, not trigger downstream issues!

cc @thoughtpolice @raskin @fpletz @globin @ajs124

ajs124 commented 1 year ago

a PR to validate the nginx config at build time was merged a few days ago