aanderse / teraflops

a terraform ops tool which is sure to be a flop
MIT License
44 stars 2 forks source link

/boot not mounted after reboot #5

Closed PolarizedIons closed 8 months ago

PolarizedIons commented 8 months ago

Possibly because of https://github.com/aanderse/teraflops/blob/9b7cc368ffb7b3b3694286316db5dbe8b34d0b5f/nix/hcloud/default.nix#L42-L44 not specifying the /boot filesystem.

This causes subsequent deployments to be "wiped" after a reboot, as nixos cannot switch the profile in the bootloader.

In my case, boot is /dev/sda15.

Using hetzner cloud

aanderse commented 8 months ago

hey i have a number of hcloud machines and none of them have a separate /boot partition - did you specifically create this?

can you please paste your teraflops config? maybe some of the hcloud machines have a /boot depending on the server_type?

PolarizedIons commented 8 months ago
{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";

    teraflops.url = "github:aanderse/teraflops";
    teraflops.inputs.nixpkgs.follows = "nixpkgs";
  };

  outputs = { nixpkgs, teraflops, ... }:
    let
      system = "x86_64-linux";

      hcloud-servers = [
        "aries"
        # "centaurus"
      ];
      hcloud-config = {
        server_type = "ccx13";
        location = "nbg1";
      };

      server-config = { pkgs, lib, ... }: {
        environment.systemPackages = with pkgs; [ htop nano zulu17 ];

        networking.firewall.enable = false;

        services.pufferpanel = {
          enable = true;
          environment = { PUFFER_PANEL_ENABLE = "false"; };
          extraPackages = with pkgs; [ bash curl gawk gnutar gzip ];
          package = pkgs.buildFHSEnv {
            name = "pufferpanel-fhs";
            runScript = lib.getExe pkgs.pufferpanel;
            targetPkgs = pkgs': with pkgs'; [ icu openssl zlib ];
          };
        };
      };
    in {
      teraflops = (

        ({
          imports = [ teraflops.modules.hcloud ];
          meta = { nixpkgs = import nixpkgs { system = system; }; };
        })

        // (builtins.listToAttrs (builtins.map (s: {
          name = s;
          value = { pkgs, lib, ... }:
            ({
              deployment.targetEnv = "hcloud";
              deployment.hcloud = hcloud-config;
            } // (server-config { inherit pkgs lib; }));
        }) hcloud-servers))

        # TODO: Other providers/manual servers
      );
    };
}

The reason I suspect it's the boot not mounting, is the following error when changing the config, and running teraflops nix apply --reboot:

 teraflops nix apply --reboot
warning: Git tree '/home/polarizedions/programming/cosmos' is dirty
[INFO ] Using configuration: /tmp/teraflops.6q_uof5o/hive.nix
warning: Git tree '/home/polarizedions/programming/cosmos' is dirty
[INFO ] Enumerating nodes...
warning: Git tree '/home/polarizedions/programming/cosmos' is dirty
warning: Git tree '/home/polarizedions/programming/cosmos' is dirty
[INFO ] Selected all 1 nodes.
      🕚 33s 1 running, 5 succeeded
aries ✅ 4s Evaluated aries
      🕙 42s 1 running, 5 succeeded
aries ✅ 4s Evaluated aries
      🕑 47s 1 running, 5 succeeded
aries ✅ 4s Evaluated aries
      ❌ 53s Failed: Unexpected active profile: Profile(StorePath("/nix/store/2idgdn6h5gcwg0bk9ahpf6qf7fcmwzxk-nixos-system-aries-24.05pre-git"))
aries ✅ 4s Evaluated aries
aries ✅ 1s Built "/nix/store/a60y9mv85hc6dgvr0ranrww2pfrj09v8-nixos-system-aries-24.05pre-git"
aries ✅ 9s Pushed system closure
aries ✅ 5s Will be activated next boot
aries ❌ 34s Reboot failed: Unexpected active profile: Profile(StorePath("/nix/store/2idgdn6h5gcwg0bk9ahpf6qf7fcmwzxk-nixos-system-aries-24.05pre-git"))                                             
[ERROR] Failed to complete requested operation - Last 1 lines of logs:
[ERROR]  failure) Unexpected active profile: Profile(StorePath("/nix/store/2idgdn6h5gcwg0bk9ahpf6qf7fcmwzxk-nixos-system-aries-24.05pre-git"))
[ERROR] Failed to reboot aries - Last 4 lines of logs:
[ERROR]  created)
[ERROR]    state) Running
[ERROR]  message) Waiting for reboot
[ERROR]  failure) Unexpected active profile: Profile(StorePath("/nix/store/2idgdn6h5gcwg0bk9ahpf6qf7fcmwzxk-nixos-system-aries-24.05pre-git"))
[ERROR] -----
[ERROR] Operation failed with error: Unexpected active profile: Profile(StorePath("/nix/store/2idgdn6h5gcwg0bk9ahpf6qf7fcmwzxk-nixos-system-aries-24.05pre-git"))
aanderse commented 8 months ago

looks like there is a pretty important difference between hetzner servers with cx11, which i have been using, and ccx13, which you are using

taking a look at the results of nixos-infect: cx11

cat /etc/nixos/hardware-configuration.nix
{ modulesPath, ... }:
{
  imports = [ (modulesPath + "/profiles/qemu-guest.nix") ];
  boot.loader.grub.device = "/dev/sda";
  boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "xen_blkfront" "vmw_pvscsi" ];
  boot.initrd.kernelModules = [ "nvme" ];
  fileSystems."/" = { device = "/dev/sda1"; fsType = "ext4"; };

}

vs ccx13

{ modulesPath, ... }:
{
  imports = [ (modulesPath + "/profiles/qemu-guest.nix") ];
  boot.loader.grub = {
    efiSupport = true;
    efiInstallAsRemovable = true;
    device = "nodev";
  };
  fileSystems."/boot" = { device = "/dev/disk/by-uuid/36D5-538A"; fsType = "vfat"; };
  boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "xen_blkfront" "vmw_pvscsi" ];
  boot.initrd.kernelModules = [ "nvme" ];
  fileSystems."/" = { device = "/dev/sda1"; fsType = "ext4"; };

}

a colmena user experienced the same end result you have and mentioned it here

unfortunately the hcloud_server doesn't (yet?) provide any disk info, so we'll have to resort to a less ideal solution like the ssh_resource


i provided a solution in #6, though note that your terraform package now requires the ssh plugin, like so: terraform.withPlugins (p: [ p.hcloud p.ssh p.tls ];