NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.08k stars 14.13k forks source link

Support sensible NixOS-on-EC2 startup from userdata #6662

Closed copumpkin closed 9 years ago

copumpkin commented 9 years ago

I know that lots of people use NixOps, but it'd also be nice to support EC2 autoscaling groups properly. To do that, we'd probably want some way to inject configuration.nix and channels into the machine via EC2 user-data. The obvious thing to do would be to just dump configuration.nix into the user-data, but unfortunately the field is limited to 16kb.

Anyone have ideas on nice ways to do this? I'm putting it here because it seems largely like a NixOS feature independent of NixOps.

copumpkin commented 9 years ago

If we come up with a nice way to do this, it could also subsume the current ec2-data.nix we have in NixOS to get the SSH host key (which probably shouldn't be there either).

copumpkin commented 9 years ago

On the other hand, 16kb is actually quite a bit of space, and people who need more could probably incorporate their own fetchurl calls to import.

copumpkin commented 9 years ago

I'm willing to experiment with making this better, but would appreciate some guidance on how to develop on the EC2 images. I'm currently calling ./create-ebs-amis.py --region us-east-1 --hvm in the nixos maintainers scripts, and that gives me an AMI, but I don't think it necessarily spawns one from my <nixpkgs> (it takes a channel argument) so I'm at a bit of a loss as to how to test changes incrementally. Anyone have any hints? This stuff isn't really documented anywhere as far as I can tell...

copumpkin commented 9 years ago

cc @rbvermaa @edolstra (not sure who else knows about this stuff)

rbvermaa commented 9 years ago

Yes, EBS creation currently is done 'on ec2' and uses a channel. As Amazon EC2 now supports uploading an image for EBS (previously only possible for S3 backed images), we should use this import facility and rewrite the create-ebs-amis.py script to use that.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/importing-your-volumes-into-amazon-ebs.html

copumpkin commented 9 years ago

I see, thanks! How did you test this stuff as you were writing it? I mostly want to build an image from a custom nixpkgs clone and don't have an easy setup with a channel.

On Tuesday, March 24, 2015, Rob Vermaas notifications@github.com wrote:

Yes, EBS creation currently is done 'on ec2' and uses a channel. As Amazon EC2 now supports uploading an image for EBS (previously only possible for S3 backed images), we should use this import facility and rewrite the create-ebs-amis.py script to use that.

— Reply to this email directly or view it on GitHub https://github.com/NixOS/nixpkgs/issues/6662#issuecomment-85392361.

copumpkin commented 9 years ago

My current status is that I've amended the service in ec2-data.nix to inject a file imported from the default configuration.nix from user-data. It then populates the channel and runs nixos-rebuild switch.

Unfortunately, I can't sensibly run nixos-rebuild switch inside a service, because doing so shuts down running services and thus kills itself mid-operation.

I'm at a bit of a loss as to how to make it work nicely. It might make sense to have everything happen during boot.postBootCommands, but I'm not sure if that'll cause other issues. Anyone have any ideas? @shlevy?

shlevy commented 9 years ago

@copumpkin This kind of thing was one of the reasons motivating exploration of a NixOS alternative: https://github.com/zalora/defnix/issues/12

I should have some slides/notes available going into detail about defnix soon.

shlevy commented 9 years ago

(to be clear: I think the NixOS all-or-nothing activation model + all-or-nothing evaluation model makes this kind of thing very difficult to add in)

copumpkin commented 9 years ago

That's interesting, thanks, but I don't yet see where the deep incompatibility comes in. What pain would I encounter by running nixos-rebuild switch (or an equivalent if that fails due to unsatisfied assumptions) during postBootCommands?

shlevy commented 9 years ago

Are you going to modify configuration.nix? At the very least, that will break nixops, since there is no local configuration.nix by default and if it is it's not at all kept in sync with the nixops config. And automatic modification of manually maintained configuration files is generally tricky, what if you have multiple tools doing this and they step on each other, and your injection almost certainly will be ad-hoc.

When exactly should these changes happen? At activation time, at boot time? If at activation time, this will break nixos-rebuild test, since that activates the new config and is supposed to not permanently switch, yet you'll run nixos-rebuild switch and the new config will be activated permanently.

What about settings and environment of nixos-rebuild? If I run nixos-rebuild with NIX_PATH such tht I use my local nixpkgs checkout instead of the channel, will your postBootCommands pick that up? How?

This is just off the top of my head, there may be more specific issues. In general, I think the "evaluate the entire system statelessly" + "activate the entire system at once" + a strict separation between stages makes this kind of thing very difficult.

copumpkin commented 9 years ago

Yeah, the scheme I currently have:

configuration.nix is a modified amazon-config.nix:

{
  imports = [ "amazon-image.nix" "/etc/nixos/amazon-init.nix" ];
}

and ec2-data.nix changes its interpretation of user data to write it out to /etc/nixos/amazon-init.nix and call nixos-rebuild switch. Since the service is a one-shot persistent unit, it should only happen during first boot (like the current ec2-data behavior). If I switch to postBootcommands, I'd emulate the "one-shot" behavior with a touched file or similar.

When exactly should these changes happen? At activation time, at boot time? If at activation time, this will break nixos-rebuild test, since that activates the new config and is supposed to not permanently switch, yet you'll run nixos-rebuild switch and the new config will be activated permanently.

I'm not sure I understand. Following most distro user-data conventions, the script is supposed to run during first boot only and I just want it to behave as if I'd just typed in nixos-rebuild switch by hand. Subsequent calls to nixos-rebuild in the running system can do what they want.

At the very least, that will break nixops

The plan was to adjust how nixops works (since I can subsume the current behavior) or just fork the images so I can use autoscaling AMIs myself.

copumpkin commented 9 years ago

And automatic modification of manually maintained configuration files is generally tricky, what if you have multiple tools doing this and they step on each other, and your injection almost certainly will be ad-hoc.

I'm not sure what you mean about automatic modification of manual config, either. This is a freshly booted AMI with user-data specifying what people want to run on it. My goal is just to have the system match what the user-data asks for and do so on an ongoing basis (so if someone logs into the system and types nixos-rebuild switch, the config shouldn't change from what got set at first boot unless someone changed configuration.nix)

In effect, I want my boot-from-user-data to act like a (quick) brand new installation of NixOS. It doesn't seem that conceptually weird or counterintuitive, and I don't know how I could be getting conflicts with user-modified config with what I described.

shlevy commented 9 years ago

Ah! I misunderstood. Yeah, OK if this is just about initial boot this is in principle fine, but you'll still need a way to get that information to systems like nixops that use NixOS without configuration.nix.

copumpkin commented 9 years ago

Yeah, NixOps currently assumes it can just dump a public/private (temporary, I hope!) SSH host key into user data with a certain format. To support the new image it would just need to change the format a little, although I'm still dubious about the practice of putting a host key into the user data in the first place. I don't think anything else would need to change.

shlevy commented 9 years ago

Oh, I wasn't even thinking about that :smile: nixops also completely ignores configuration.nix, which is the issue I meant.

copumpkin commented 9 years ago

Oh, I see. We should talk on IRC sometime :smile:

edolstra commented 9 years ago

@copumpkin The SSH host key sent via the user data is temporary: https://github.com/NixOS/nixops/commit/f6663b456a5eef3da5d5e5baa7e46ab33b236b04

rbvermaa commented 9 years ago

@copumpkin It should be simple to distinguish between an old format and a new format for the userdata, so that both are supported without breaking backward compatibility with e.g. nixops.

edolstra commented 9 years ago

Why should we change the format at all? It's just a bunch of name/value pairs, so it's possible to add new fields without breaking anything.

rbvermaa commented 9 years ago

Great, even easier to keep backwards compatible then.

copumpkin commented 9 years ago

I'd just prefer not to deal with escaping, newlines, and stuff like that when we have a perfectly good format to use called nix :) but anyway, first I'll try to get it working and then we can figure out what format works best.

Also, I realized the host key is temporary but it still feels slightly wrong so someday I'd like to see if I can come up with a way to avoid putting it there. There's also the fact that autoscaling machines typically won't need NixOps to even SSH into them promptly so the two uses of user data are unlikely to overlap.

On Mar 25, 2015, at 05:54, Eelco Dolstra notifications@github.com wrote:

Why should we change the format at all? It's just a bunch of name/value pairs, so it's possible to add new fields without breaking anything.

— Reply to this email directly or view it on GitHub.

copumpkin commented 9 years ago

As a reminder, this is the logic I want to put somewhere:

  1. Fetch a file (amazon-init.nix) from userdata
  2. nixos-rebuild switch

I have a bit of a conundrum:

It seems like what I need is some notion of a "fire-and-forget" systemd service that can depend on ip-up and also run nixos-rebuild without killing itself. I haven't used systemd much but I'm wondering if I can somehow detach or nohup the actual nixos-rebuild call from within it.

As always, I'm open to ideas or suggestions to save myself from going down the wrong path!

edolstra commented 9 years ago

You can prevent a unit from being restarted by setting restartIfChanged = false.

copumpkin commented 9 years ago

Oh! I tried stopIfChanged = false and that didn't work, but didn't notice restartIfChanged! I'll give that a go, thanks :smile:

copumpkin commented 9 years ago

Nope, restartIfChanged = false still doesn't help:

$ systemctl cat fetch-ec2-data.service | grep RestartIfChanged
X-RestartIfChanged=false
$ journalctl -u fetch-ec2-data.service | tail -n3
systemd[1]: Stopping Fetch EC2 Data...
fetch-ec2-data-start[1617]: /nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/nixos/modules/installer/tools/nixos-rebuild.sh: line 1: 11484 Terminated              $pathToConfig/bin/switch-to-configuration "$action"
systemd[1]: Stopped Fetch EC2 Data.

:frowning:

copumpkin commented 9 years ago

I think the issue there is that the skip logic for X-RestartIfChanged (in switch-to-configuration.pl) is guarded behind checking if the path my $prevUnitFile = "/etc/systemd/system/$baseUnit"; exists, and it doesn't exist yet at the point I'm running.

copumpkin commented 9 years ago

Great success! It's currently hacked up and I don't have time to clean it up, but I have a basic proof-of-concept working and will polish it up and put it up somewhere for someone to dissect :smile: :smile: :grinning:

nyarly commented 9 years ago

@copumpkin Did you put this up somewhere?

copumpkin commented 9 years ago

@nyarly the EC2 images support it now, but we need to wait for a release (any day now!) until they go live.

@rbvermaa is there a clean way for hydra to publish AMIs for its builds? it's a "slightly impure" operation, but would be pretty cool.

nyarly commented 9 years ago

@copumpkin - I looked at the AMIs for 15.09 and they seem to have the amazon-image.nix file in their pages, but not their configuration.nix - is this coming soon? Is there an easy way to build my own userdata-capable image?

copumpkin commented 9 years ago

@nyarly I don't think amazon-image.nix needs to be in the configuration.nix. Have you tried booting the 15.09 images with userdata? I haven't tested the feature in a while but it worked last time I tried.

nyarly commented 8 years ago

@copumpkin Sorry for the late followup: I just tried using an existing configuration.nix as the userdata to a fresh 15.09 instance, an I no dice. The new instance comes up with the stock configuration.nix, with the intended one appearing in /root/user-data. Am I doing something wrong?

copumpkin commented 8 years ago

@nyarly it might have broken since I merged it. I had a VM test for it but it broke and got turned off, and I haven't had time to fix it. Will check again at some point. Sorry!