aws-cloudformation / cloudformation-coverage-roadmap

The AWS CloudFormation Public Coverage Roadmap
https://aws.amazon.com/cloudformation/
Creative Commons Attribution Share Alike 4.0 International
1.11k stars 54 forks source link

AWS::EC2::Instance -BlockDeviceMapping-Ebs should allow volume changes without replacing the instance #112

Open sturdiva opened 5 years ago

sturdiva commented 5 years ago

1. AWS::EC2::Instance -BlockDeviceMapping-Ebs should allow volume changes without replacing the instance

2. Scope of request

It should be possible to change EBS volume attributes (such as VolumeSize or VolumeType) for volumes specified in the BlockDeviceMappings property of an AWS::EC2::Instance resource without re-creating the instance. For example, being able to re-size (via CloudFormation) the root volume of an instance.

This is currently possible via the API/Console, but not via CloudFormation.

3. Expected behavior

On modification of (at least the VolumeSize and VolumeType properties) the AWS::EC2::Instance resource should not be re-created, just the underlying volume properties modified.

4. Suggest specific test cases

Common use case: pass VolumeSize andVolumeType parameters as a string, or as a !Ref

5. Helpful Links to speed up research and evaluation

https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_ModifyVolume.html

6. Category (required) - Will help with tagging and be easier to find by other users to +1

  1. Compute (EC2, ECS, EKS, Lambda...)
webframp commented 4 years ago

Hit this exact problem right now. Had to do an emergency volume growth on a running system, but now I can't retroactively update the cloudformation stack to match the new volume config since it wants to trigger an instance replacement.

luiseduardocolon commented 4 years ago

Not that it resolves the issue above, and it is a bit involved, but you could use the recently released import feature to a) change the retention policy of the instance to Retain first, b) delete the instance from the stack (but retaining it), and c) reimporting it into the stack. It should have the effect of syncing up with your change (or, said another way, remediating the configuration drift). More info on the import feature here: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/resource-import.html - and, you could use https://former2.com/ to help generating the required template snippet. Worth experimenting, but do it outside of production first :)

webframp commented 4 years ago

Great suggestion @luiseduardocolon I’ll try it out in a separate account

webframp commented 4 years ago

Based on initial testing I can confirm that the workaround from @luiseduardocolon using a resource import works.

rgoltz commented 4 years ago

@PatMyron @luiseduardocolon - Do you have a chance to look at this? - Our team has been constantly challenged and overworked by this limitation of BlockDeviceMapping: VolumeSize via CFN-change triggers an EC2 instance replacement.

Could you possibly already classify this on the CloudFormation Roadmap? In case you like to get more details, please let me know. We would love to share our experience. Thanks!

GrahamLea commented 3 years ago

I just stumbled upon this while looking for something else, and I can't believe this is how it works. There also seems to be no indication of this behaviour in the docs, where Ebs.VolumeSize (for example) says "Update requires: No interruption".

Is there a workaround for this using just CloudFormation? e.g. Will creating a Volume and attaching it to the Instance create a similar setup but without the recreate-on-resize behaviour?

sturdiva commented 3 years ago

The docs are un-clear here, if you look at BlockDeviceMappings it will show "Some interruptions", but the paragraph above has:

After the instance is running, you can modify only the DeleteOnTermination settings of the attached EBS volumes.
wimsymons commented 3 years ago

Changing Iops should be included as well.

Nico-DB commented 3 years ago

We need this feature, too. +1

RaoM24 commented 3 years ago

Having the same issue with the root volume. CFT is trying to replace the EC2 instance. Any timelines on fix please.

wimsymons commented 3 years ago

Changing Throughput (for gp3) should not trigger a replacement either. But that is, when Throughput is supported on AWS::EC2::Instance Ebs (https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-blockdev-template.html) first. See #824

RaoM24 commented 3 years ago

It's been waiting to fix since 2019. Please may I know if there is any update or timelines. Thanks.

Benjamin-L commented 3 years ago

It would be very helpful if the docs were updated to reflect the current behavior until this is implemented. Even if we include the "Some Interruptions" annotation on BlockDeviceMappings, it still doesn't cover what actually happens, which is instance replacement.

I read "Some Interruptions" on BlockDeviceMappings as "adding or removing mappings may temporarily interrupt the instance", which is not the case.

madmox commented 3 years ago

Absolutely unacceptable that this is not at least documented in the CloudFormation spec. This could lead to live production instances being deleted if users don't bother to run the CFN update in a pre-production environment beforehand. And it forces other users to use the retain/import workaround to be able to use CFN after the update.

Dirk-Sandberg commented 2 years ago

We really need this fixed, we are trying to push our users to use infrastructure-as-code to make managing our servers easier. When they tell us that updating the root volume causes complete redeploys of machines, it makes all the sense in the world that they just want to use the console and ignore the infrastructure as code.

If a user submits a ticket to our IT system to increase their root volume size, and someone else trying to be helpful grabs the ticket and updates the cloudformation stack without knowing this "gotcha" -- boom, angry user

wgroenewald commented 2 years ago

We really need this fixed, we are trying to push our users to use infrastructure-as-code to make managing our servers easier. When they tell us that updating the root volume causes complete redeploys of machines, it makes all the sense in the world that they just want to use the console and ignore the infrastructure as code.

If a user submits a ticket to our IT system to increase their root volume size, and someone else trying to be helpful grabs the ticket and updates the cloudformation stack without knowing this "gotcha" -- boom, angry user

Our workaround has been to deploy everything with cloudformation, but root volumes aren't resized with cfn - we have about 4 cli commands that run after the cfn script to decide whether the root volume should be increased or not based on some variables. That works for us because we have a deployment pipeline, but for someone who runs pure cfn in the console that's not a viable workaround.

Herrick19 commented 2 years ago

I've reported this issue to support more than 2 years ago. I can't believe this still haven't been fixed.

Sometimes it makes me wonder.

Amazon, for god sake, could you please slow down new features and products development and instead use that time to fix stuff we really need and doesn't work. Over the years, I probably reported more than 25 problems that aren't supported or have issues with cloudformation. It makes us waste time. I absolutely love AWS, but I hate it every time I need to do a workaround, do stuff in the console instead of Cloudformation or being answered "It doesn't work in cloudformation, but you can do a lambda function in cloudformation that will do what you want"... I just want cloudformation to work, is that too much to ask when we pay thousands of dollars per month for the service ?

Please fix these things

Hokwang commented 2 years ago

Any updates?

RaoM24 commented 2 years ago

Hi there,

Any update on this please. Its been nearly 3 years this case opened.

shotty1 commented 2 years ago

This one really hurts a lot when we try to show people the benefits of the Cloud, IaC and good practises.

surendarsuren27 commented 2 years ago

+1 we also need this feature

TGinger01 commented 2 years ago

+1 Waiting for this too.

ghost commented 2 years ago

+1 CF and console consistency would be great!

wcoleman commented 2 years ago

Any updates or workarounds on this issue? We would really like to be able to resize an EBS volume though cloudformation without a replacement.

Benjamin-L commented 2 years ago

@wcoleman

Any updates or workarounds on this issue? We would really like to be able to resize an EBS volume though cloudformation without a replacement.

You can create a separate AWS::EC2::Volume resource and then a AWS::EC2::VolumeAttachment resource instead of embedding the volume definitions in the instance definition. The Volume type supports resizing without interruption. I've only used this for non-root volumes, so not sure if there are any gotchas when trying to use this approach for root volumes.

sturdiva commented 2 years ago

@wcoleman

Any updates or workarounds on this issue? We would really like to be able to resize an EBS volume though cloudformation without a replacement.

You can create a separate AWS::EC2::Volume resource and then a AWS::EC2::VolumeAttachment resource instead of embedding the volume definitions in the instance definition. The Volume type supports resizing without interruption. I've only used this for non-root volumes, so not sure if there are any gotchas when trying to use this approach for root volumes.

This will work for non-root volumes (and is exactly the process we make use of), but does not work for root volumes, which must be specified (and managed) via the BlockDeviceMappings section of the AWS::EC2::Instance

Benjamin-L commented 2 years ago

Was just looking at the docs today, and apparently they have added a warning about the replacement, which is definitely an improvement from the first time I ran into this 🤷‍♀️.

Screen Shot 2022-05-27 at 11 33 44

PUPeter commented 2 years ago

Terraform manages to resize root volumes without terminating the instance or having to create a new volume. I'm trying to migrate to CloudFormation, but issues like this are surprising considering this is the native IaC solution for AWS.

vhaispdeaded commented 2 years ago

I have discovered, that even if an instance is stopped, updating the VolumeSize will cause the entire EC2 to be updated. This needs to be addressed.

Hokwang commented 1 year ago

yeah, my team also moved to terraform. good bye cfn.

Herrick19 commented 1 year ago

AWS Is sometime pathetic... 2 years and counting... For such an important issue. It's Ironic Terraform handles this well when AWS cannot even get there own things working

njgraham commented 1 year ago

I'm hitting this now as well using CDK. I too had to do an emergency resize of the volume - CDK can't do it so it was done manually. Now CDK is out of sync. I'm trying to test out the import procedure noted above after creating a whole new test stack.

We recently started to lean in to CDK instead of Terraform - not a good experience in this particular case.

Benjamin-L commented 1 year ago

The fun part about this is that CDK's story for how to resolve drift is... very bad.

strickdd commented 1 year ago

+1 just ran into this and had to debug one delta at a time in the CFT until I was left with just the root volume resize which can be easily done outside of CFTs without taking the system down :(

Louis-Tian commented 1 year ago

Never worked for AWS. So this is only my speculation.

The fundamental problem here is that logical id only exists for the top-level resources (i.e. keys for the "Resources"). image https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/resources-section-structure.html

Whenever cloud formation successfully creates a stack, it keeps track of the physical IDs (ARNs) created and associated logical IDs in its own database. This mapping is required for the engine to calculate the difference in subsequent updates as well as detect any drift. When an EBS volume is defined inside the BlockDeviceMapping, that EBS volume doesn't have a logical id anymore simply because it's not a direct child of the "Resource", hence there is mapping available. Consequently, there is no way in future updates for the engine to tell which EBS volume is associated with what is defined in the CFN template. (BlockDeviceMapping is a list, there is no way to tell the exact changes that have been made without some assumptions, For example, if the engine sees two EBS on the template but there is only one actually exists on the system, which one is this one?)

A sensible approach to fix this problem is to make DeviceBlockMapping a key/value dictionary instead of a list. Each volume can then be logically identified. But that's probably not a completely backward-compatible change. Also, I suspect the assumption of logical Ids are the top-level "Resource" keys is deeply engraved, and nobody what to take the responsibility to make such a large change internal.

This is a major problem.

Yes, there are AWS::EBS::Volume and AWS::EBS::AWS::EC2::VolumeAttachment. But

  1. Don't work for the root volume so we are forced to use the secondary volume.
  2. There is no guarantee that the volume will be attached at boot time. This makes things a lot more complicated for the users than they can be. especially when doing it scripted but hey isn't that what IaC is all about?

On a "positive" note, it seems Terraform got the exactly same problem. From Terraform's documentation

Currently, changes to the ebs_block_device configuration of existing resources cannot be automatically detected by Terraform. To manage changes and attachments of an EBS block to an instance, https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance#ebs-ephemeral-and-root-block-devices

The difference is that Terraform chooses to ignore any difference rather than replace the instance. So well, if your biggest competitor has the same problem then...

pahwa014 commented 1 year ago

+1 Would really like this feature rather than a manual change.

mcboudreau2 commented 10 months ago

+1 really need this..

webmozart commented 9 months ago

We need this too

raimondsvizulis commented 8 months ago

+1 need this too

gnought commented 5 months ago

+1. It's an important feature.

Upsizing root ebs volume in AWS console doesn't replace the EC2 instance at all. I guess AWS could do its magic behind when we upsize the root volume thru cloudformation or in a cdk manner.

dil-ddecarvalhogomes commented 4 months ago

It's absurd that this flawed, poor design hasn't been addressed in almost 4 years. People blindly trusting CDK to accomplish stuff via pipelines are in for a rude awakening. The CDK drift correct process is risky and overly complicated. Everything about this is bad.

emmapatterson commented 3 months ago

+1 need this too

absld commented 2 months ago

+1 Please urgently address this issue; it is unexpectedly breaking entire application deployments!