NixOS / nixops

NixOps is a tool for deploying to NixOS machines in a network or cloud.
https://nixos.org/nixops
GNU Lesser General Public License v3.0
1.85k stars 365 forks source link

GCE disks timing out #954

Open jbboehr opened 6 years ago

jbboehr commented 6 years ago

I'm trying to use GCE with autoLuks but the local-fs.target keeps timing out. I'm using a bootstrap image built off of 18.03 commit ef74caf: https://github.com/BitKitchen/nixpkgs-channels/commit/664e4ee15bf543cc2213da2412beda43fdcd6962 with this patch applied: https://github.com/NixOS/nixpkgs/pull/39654

The relevant part of my GCE config is:

{
  deployment.gce.blockDeviceMapping."/dev/sdb" = {
    disk_name = "data";
    diskType = "ssd";
    size = 40; # in GB
  };

  deployment.autoLuks.data = {
    device = "/dev/sdb";
    autoFormat = true;
    passphrase = luksPassphrase;
  };

  fileSystems."/var/lib/consul" = {
    fsType = "ext4";
    options = ["noatime" "nodiratime"];
    autoFormat = true;
    device = "/dev/mapper/data";
  };
}

It fails about 75% percent of the time with, in nixops output:

consul3........> starting the following units: audit.service, google-accounts-daemon.service, google-clock-skew-daemon.service, google-ip-forwarding-daemon.service, google-shutdown-scripts.service, kmod-static-nodes.service, network-local-commands.service, network-setup.service, nix-daemon.socket, nscd.service, systemd-journal-catalog-update.service, systemd-modules-load.service, systemd-sysctl.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-update-done.service
consul3........> A dependency job for local-fs.target failed. See 'journalctl -xe' for details.
consul3........> the following new units were started: configure-forwarding-rules.service, firewall.service, keys.target, network-pre.target, nixops-keys.service, ntpd.service, unbound.service
consul3........> error: Traceback (most recent call last):
  File "/nix/store/g29r9m0ch3dsdvbz4s5ygxqfnc05il7l-python2.7-nixops-1.6.1pre0_abcdef/lib/python2.7/site-packages/nixops/deployment.py", line 731, in worker
    raise Exception("unable to activate new configuration (exit code {})".format(res))
Exception: unable to activate new configuration (exit code 4)

Is there some kind of timeout on the formatting perhaps?

jbboehr commented 6 years ago

Here's the relevant section from journalctl

May 15 00:18:44 consul2 systemd[1]: dev-mapper-n\x2d66990d33557011e8bcd10242c01b154d\x2dconsul2\x2dconsul.device: Job dev-mapper-n\x2d66990d33557011e8bcd10242c01b15>
May 15 00:18:44 consul2 systemd[1]: Timed out waiting for device dev-mapper-n\x2d66990d33557011e8bcd10242c01b154d\x2dconsul2\x2dconsul.device.
May 15 00:18:44 consul2 systemd[1]: Dependency failed for Initialisation of Filesystem /dev/mapper/n-66990d33557011e8bcd10242c01b154d-consul2-consul.
May 15 00:18:44 consul2 systemd[1]: mkfs-dev-mapper-n\x2d66990d33557011e8bcd10242c01b154d\x2dconsul2\x2dconsul.service: Job mkfs-dev-mapper-n\x2d66990d33557011e8bcd>
May 15 00:18:44 consul2 systemd[1]: Dependency failed for /var/lib/consul.
May 15 00:18:44 consul2 systemd[1]: Dependency failed for Local File Systems.
May 15 00:18:44 consul2 systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
May 15 00:18:44 consul2 systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
May 15 00:18:44 consul2 systemd[1]: local-fs.target: Failed to enqueue OnFailure= job: No such file or directory
May 15 00:18:44 consul2 systemd[1]: Dependency failed for consul.service.
May 15 00:18:44 consul2 systemd[1]: consul.service: Job consul.service/start failed with result 'dependency'.
May 15 00:18:44 consul2 systemd[1]: var-lib-consul.mount: Job var-lib-consul.mount/start failed with result 'dependency'.
May 15 00:18:44 consul2 systemd[1]: dev-mapper-n\x2d66990d33557011e8bcd10242c01b154d\x2dconsul2\x2dconsul.device: Job dev-mapper-n\x2d66990d33557011e8bcd10242c01b15
jbboehr commented 6 years ago

~I changed up some things, and now I'm getting an ominous:~ (edit: it was caused by a typo)

May 15 04:28:32 consul1 systemd[1]: cryptsetup-data.service: Dependency Before=dev-mapper-data.device ignored (.device units cannot be delayed)
jbboehr commented 6 years ago

I've added this and it seems to be working now:

{
  systemd.services.cryptsetup-data.serviceConfig.RemainAfterExit = "yes";
  systemd.services.cryptsetup-data.serviceConfig.KillMode = "none";
  systemd.services.cryptsetup-data.postStart = ''
      udevadm control --reload-rules
      udevadm trigger
    '';
}