Netflix / aminator

A tool for creating EBS AMIs. This tool currently works for CentOS/RedHat Linux images and is intended to run on an EC2 instance.
Apache License 2.0
951 stars 170 forks source link

support partitioned AMI creation #129

Open kvick opened 11 years ago

kvick commented 11 years ago

Aminator currently assumes that we use a partition-less disk. Some users have requested something like this in the past #59. This may be a duplicate, so feel free to close this one and reopen #59.

The end goal is that we can aminate a volume that contains a partition table. We may require some additional configs to know where/how to mount the volume.

jhohertz commented 10 years ago

Pull #89 speaks to the same thing.

I've run into this item I think, and may try to take something on around this.

jhohertz commented 10 years ago

Reading the other issues around working w/ partitioned AMIs, I think perhaps my issue is a bit different, Aminator seems to identify there is a partition ok, and the mistake is in trying to tell EC2 to attach a volume as a partition rather than a block device.

I could probably tweak it such that that call detects if there is a partition number, and strips it from the attach (and i guess detach) operation, but expect will need further changes, such as the suggestion around a --partition parameter to tell Aminator which partition to care about for the remainder of the operations.

If anyone notices my messages, and knows of any other activities around such support I should look at before diving it, it would be appreciated.

Thanks.

coryb commented 10 years ago

The base ami that you aminate on top of would have partitions, so there is no way to detect the partitions without attaching it first, but before you attach it you have to give it a proper device name (ie sdg vs sdg1). So there is a bit of a race condition. One option is to have a special tag on the base ami to indicate it has partitions also indicate which partition is the root. I have a feeling that the changes might be a bit messy, I think there are quite a few assumptions (in the form of device naming conventions) that assume we are dealing with partitions-less disks.

-Cory

jhohertz commented 10 years ago

Thanks for the feedback. I'm trying to trace the references to device nodes through the code, I'm missing something fundamental as I can't currently see how it would work on whole devices vs partitions, mostly because of this bit of code:

    self._allowed_devices = [device_format.format(self._device_prefix, major, minor)
                             for major in majors
                             for minor in xrange(1, 16)]

Which is generating a list consisting only of partition devices, (IE: everyone has a suffix from 1-16) to consider attaching a volume to.

But again, I must be missing something, as this must work for other folks.

bmoyles commented 10 years ago

A partitioned AMI will likely be registered with a different pvgrub AKI than non-partitioned AMIs. I believe the partitioned AMIs are registered with the hd00 flavor AKI as they most often have /boot (and subsequently menu.lst) in hd0,0. The partitionless images are usually registered with the hd0 AKI. This doesn't account for custom partition schemes, of course--someone could choose to have partitions but no distinct /boot (and thus be forced to use hd0) but that's a more distant edge case than just vanilla partitioning...

jhohertz commented 10 years ago

I was half wondering if some property of the AMI was driving the selection of the device node it tries to attach the created EBS volume, but that code above. with the prefix being "xvd", the majors starting at "f", and numbers 1-16, seems to be the only source of candidates it will try to request the EBS to attach to. So it tries to do the equivilant in boto of:

ec2-attach-volume vol-123456 -i i-myintid -d /dev/sdf1

And aws, not expecting a device node w/ a partition, fails the call. (get the same API error if I do it that way, and dropping partition off the device node succeeds)

From a an output with --debug on, this is where I am falling down:

2014-03-12 18:15:18 [INFO] looking up base AMI with ID ami-0d9c9f64 2014-03-12 18:15:18 [INFO] Successfully resolved ubuntu/images/hvm/ubuntu-precise-12.04-amd64-server-20140227(ami-0d9c9f64) 2014-03-12 18:15:18 [INFO] Searching for an available block device 2014-03-12 18:15:19 [INFO] Block device /dev/xvdf1 allocated 2014-03-12 18:15:20 [ERROR] 400 Bad Request 2014-03-12 18:15:20 [ERROR] <?xml version="1.0" encoding="UTF-8"?>

InvalidParameterValueValue (/dev/sdf1) for parameter device is invalid. /dev/sdf1 is not a valid EBS device name.c30c7150-168f-47f0-a81e-b5c51e7b3b7e

So again, I'm not even sure if I am getting as far as the part of the code that needs to be made aware of partition, it feels like i'm still just in a setup phase of allocating resources for the job.

coryb commented 10 years ago

I think you should have -d sdf1 not -d /dev/sdf1

Aminator starts at sdf because some aws instance types have a root disk (a) + 4 ephemeral disks (b-e).

-Cory

jhohertz commented 10 years ago

Hmmm. Just did a test, I can drop /dev/ from the name, but it refuses to deal with a partition node still, IE:

ec2-attach-volume vol-f07499bc -i i-a3a36680 -d sdf1

Client.InvalidParameterValue: Value (sdf1) for parameter device is invalid. sdf1 is not a valid EBS device name.

ec2-attach-volume vol-f07499bc -i i-a3a36680 -d sdf

ATTACHMENT vol-f07499bc i-a3a36680 sdf attaching 2014-03-12T19:24:34+0000

ec2-detach-volume vol-f07499bc -i i-a3a36680 -d sdf

ATTACHMENT vol-f07499bc i-a3a36680 sdf detaching 2014-03-12T19:24:34+0000

But Aminator, and the discussion thus far, implies it should be possible to attach a single partition to EBS. I must be missing something, as that seems wrong on several levels to me. (Sure, don't partition it, format the whole disk, but partition without a larger volume containing it? Seems wrong to me)

Can you confirm that I should be able to attach directly to a partition node, and this is a behaviour Aminator relies upon? (Experienced with server admin, but a little newer to EC2, so maybe my intuition is steering me wrongly here...)

Just ran this by my peer here, affirmed I'm not crazy, block devices need to be attached as such, calling attach volume with the attachment point ending in a number vs. letter just isn't valid.... leading me back to being curious why the allowed devices list contains partition numbers (see 3rd comment)

jhohertz commented 10 years ago

Sorry for all the updated/edits, trying to provide as much info as I can.

I've been using master branch, notice there is a lot of activity since in testing, perhaps I will give that a whirl...

bmoyles commented 10 years ago

Aha! Your debug output has the key.

jhohertz commented 10 years ago

Thanks, I was starting to get to the realization on your second point after some reading, but you make it rather more clear. (Realized I was missing the "foundation" AMI)

To the first point, I've been directed to create HVM-based images, as we're aligning to instance types requiring them, can I create those from a para image? I assumed I'd need to base from same.

Thanks a lot for your help and feedback.

bmoyles commented 10 years ago

The only way for regular users to create HVM images (today) is to launch an instance and use create-image, unfortunately :\ Amazon clearly has the ability to grant folks the ability to create HVM images as Canonical clearly can create their own, but they have not released that functionality wide. Until they do so, you won't be able to use aminator to create images, unfortunately.

coryb commented 10 years ago

Well, sort of. We recently added --vm-type hvm option to aminator (testing branch) to register an ami as hvm. We aminate on top of a pv base image that is capable of booting hvm or pv. I am not sure if the canonical images are capable of booting hvm or pv out of the box I believe we did some grub magic to get it working right (I dont have the details at the moment).

-Cory

bmoyles commented 10 years ago

cough you guys have special permission from Amazon too cough :)

coryb commented 10 years ago

Could be, but I think it is open to general availability now. It is fully supported via the boto python clients. Although of course I only have tested in the accounts we own.

-Cory

bmoyles commented 10 years ago

Hmm, weird, the API does support it but none of the official documentation describes the process of actually creating one. Their line was they weren't ready to support it widely due to the complexity of creating the image (need to monkey with grub and other foo due to the HVM bootloader being more akin to a real x86 bootloader versus pvgrub).

bmoyles commented 10 years ago

Huh, I take it all back. Just registered one of our snaps with hvm as the vm type and sure enough it works. These aren't the droids you're looking for... move along :)

jhohertz commented 10 years ago

This sounds good. :) I am just about to sit down and do my foundation image now, planning to do one by hand by looking through the scripts in netflixoss-ansible, and the wiki-page-with-warnings. Then I can find out if I have an actual problem with partitions (which seem to be more common on HVM types perhaps?)

Someone else today framed HVM in their build as just recently "kind of" working. (Nothing to do Aminator). No idea if that's from generated within EC2 or not.

jhohertz commented 10 years ago

Just a note I'm a lot further along now, partly from ensuring I'm using a foundation, partly from switching to running Aminator in a non-HVM instance. Debugging stuff to do with the ansible playbooks still, so not quite testing yet, but well past the issue that brought me here.

jhohertz commented 10 years ago

So with one little patch to aminator-plugins/ansible-provisioner, I have an ansible-provisioned Aminator instance, Aminating an Aminator AMI, for PV/EBS. Delving in deeper I see my mistake around expecting "normal" semantics around device attachment and partitions.

I tried --vm-type hvm from my PV foundation, and Aminator is happy about it, but it fails to boot, probably due to something lacking in the foundation. I found reference in the EC2 docs about downgrading GRUB, as I understand HVM, it would be looking for a first stage in the boot sector, then jumping into the later stage from /boot, and my whole disk foundation can't handle that. Perhaps I just need to grub-install... forgot till I was writing this that there's room for a boot sector on the whole-disk filesystems. Maybe need to sort something out with kernel too... I am still catching up, did Xen / normal kernels ever unify?

Back to the topic of the ticket... since I've taken it so far off course, I had a thought.

Some of the challenge seems to be around how to handle an arbitrary partitioning arrangement. (/boot on 1 vs one /). I wonder if you really need to. Since everyone needs to craft a foundation, they'll know how it's setup. Maybe allow passing an fstab (or subset) into Aminator as a parameter. That said --partition would allow fetching the actual table for setting up a chroot layout. (And it's presence or not, switching between caring about partitions vs. whole disk)

jhohertz commented 10 years ago

Okay, so I've now generated partitioned AMIs for PV/HVM referencing the same snapshot, getting the parity between environments I was hoping for, and I'm back to looking @ Aminator... just want to run my thinking/idea by you, as I'd hope whatever I do would be worthwhile to pull.

I am basing my thinking around not really wanting multiple partitions, the table is more just a mechanism to support HVM+PV from a single source. For my purposes, a single partition is just fine, somewhat matches existing expectations of Aminator, and I am thinking may be fairly simple to add support for this model.

So what I am looking at is implementing a new "blockdevice" and "volume" plugin... and adding a new environment definition, that switches to using these. The working name I am using is "linux1part". I suspect I may find I need to derive a new finalizer as well, but haven't dug that far yet.

Does this align to any plans/thinking you have internally on this topic? There is certainly more elaborate ways this could be done, but I'm not sure how much value there really is in elaborate partitioning arrangements for the root volume... it would increase complexity a bit around managing multiple mounts setting up the chroot... maybe later could look at a "linuxNpart" setup at a later time?

jhohertz commented 10 years ago

Not actually sure I'd need to change anything about volume the more I look at it.

Thinking of renaming the current linux blockdevice "slice" (it's not a partition per-se, so borrowing a BSD-ism) and adding "disk" for the model of a numberless "whole" block device.

And beyond that, I just see one bit where we need to work around oddness of the ec2 api around blockdevice mappings. To get PV mounting the whole disk, you do a sensible:

  -b /dev/sda=<snap>:<params>

On HVM this apparently fails, and we do a very counter-intuitive:

  -b /dev/sda1=<snap>:<params>

Which if you think about it, is the exact opposite of how this effort is trying flip things about...

But I think those are the two main things that would need doing to get a new blockdevice that groks the single-partitioned volumes.

If these come from the foundation ami meta, the latter point may not even need treatment by Aminator. That would mean looking @ two AMI meta... which would get awkward. I have both types pointing at the same snap, and with assumption around a single partition always, and a desire to source both types off one process... now I'm looking @ cloud.ec2's handling of mappings. It feels a bit hard-coded to do what I suggest above around the odd mapping needs, but not doing that would involve, I think, externalizing mappings into a plugin type, single-partition being the first, more complex as other plugins to this new type? Which may be the way to go to step towards the larger theme of this ticket?

jhohertz commented 10 years ago

Just a note of where I'm at. I split the volume/blockdevice plugins into slice and disk variants. Was simple enough.

Where I'm stuck, is what to do with the twists on root device mapping needed. in "slice" mode, this matches root-device-name (/dev/sda1), where we need /dev/sda for pvm disk type, and /dev/sda1 for hvm disk type in the block device mapping (not sure why the latter).

I think the volume type needs to register something for both a root-device-name, and a "mapping-root-name" for the device map. It seems cleaner than the tag plugins inspecting which volume plugin was run.

I must confess higher-order Python isn't one of my first languages, so not sure the best way to approach it. I see there's some context models and some cases of things stuffing keys into it, but unclear enough of the specifics to be a little wary of making naive changes. (what scopes are there, etc.).

jhohertz commented 10 years ago

Just a note that my own effort to add such support stalled, as I worked around the issue for a bit by avoiding Aminator, and now I've managed to build Aminator-compatible foundations that work with HVM machine types, which makes this a whole lot less pressing for myself personally, as the HVM support was the big driver initially for looking for support for partitioned AMIs in Aminator.

grahamlyons commented 8 years ago

This was addressed in #207