aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.5k stars 3.85k forks source link

aws-cdk/aws-batch: Skip use of optimal instance type with graviton instances #31148

Open jasonforte opened 3 weeks ago

jasonforte commented 3 weeks ago

Describe the bug

I'm attempting to create an EC2 ECS compute environment using the AWS Batch constructs that makes use of only Graviton instances. When I deploy the stack I get the following error:

Error executing request, Exception : arm-based instance type cannot be used with other instance types.

Regression Issue

Last Known Working CDK Version

No response

Expected Behavior

When encountering a know arm-based instance class, the construct should not append optimal as it's not supported for arm based instances

Current Behavior

When deploying a EC2 ECS Compute Environment, the error below is thrown.

14:57:17 | CREATE_FAILED        | AWS::Batch::ComputeEnvironment | MyECSComputeEnvironment6A03089C
Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 167a0e5
f-0376-4ad4-a0b6-34a85a940cb7 (Service: Batch, Status Code: 400, Request ID: 167a0e5f-0376-4ad4-a0b6-34a85a940cb7)" (RequestToken: d266edf5-f5cd-d61f-866f-5e5
ecefa478c, HandlerErrorCode: InvalidRequest)

Reproduction Steps

I'm declaring my compute environment as follows:

const computeEnvironment = new ManagedEc2EcsComputeEnvironment(this, 'MyECSComputeEnvironment', {
    vpc,
    minvCpus: 0,
    maxvCpus: 8,
    instanceClasses: [InstanceClass.M6G],
})

When deploying there is an error:

14:57:17 | CREATE_FAILED        | AWS::Batch::ComputeEnvironment | MyECSComputeEnvironment6A03089C
Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 167a0e5
f-0376-4ad4-a0b6-34a85a940cb7 (Service: Batch, Status Code: 400, Request ID: 167a0e5f-0376-4ad4-a0b6-34a85a940cb7)" (RequestToken: d266edf5-f5cd-d61f-866f-5e5
ecefa478c, HandlerErrorCode: InvalidRequest)

Possible Solution

When I inspect the synthesized CloudFormation Stack I can see that optimal has been appended to the InstanceTypes list:

image

Possible solution would be to fix logic in renderInstances to take into account the types of instances (Arm vs AMD).

Additional Information/Context

Workaround

I've found workaround by setting the useOptimalInstanceClasses to false but this is not expected behavior because I've explicitly set the types of instances I want.

const computeEnvironment = new ManagedEc2EcsComputeEnvironment(this, 'MyECSComputeEnvironment', {
    vpc,
    minvCpus: 0,
    maxvCpus: 8,
    instanceClasses: [InstanceClass.M6G],
    useOptimalInstanceClasses: false
})

CDK CLI Version

2.152.0 (build faa7d79)

Framework Version

No response

Node.js Version

v20.14.0

OS

ubuntu

Language

TypeScript

Language Version

No response

Other information

No response

ashishdhingra commented 3 weeks ago

@jasonforte Good morning. Thanks for reporting the issue. I was able to reproduce the issue using the provided code, where the error is thrown by CloudFormation:

11:31:22 AM | CREATE_FAILED        | AWS::Batch::ComputeEnvironment | MyECSComputeEnvironment6A03089C
Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 07f6a0b1-5003-4004-88c7-3ae2c317cb24 (Ser
vice: Batch, Status Code: 400, Request ID: 07f6a0b1-5003-4004-88c7-3ae2c317cb24)" (RequestToken: 3e651d0a-82c0-ed7b-6c77-01a111440adf, HandlerErrorCode: InvalidRequest)

 ❌  CdktestStack failed: Error: The stack named CdktestStack failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 07f6a0b1-5003-4004-88c7-3ae2c317cb24 (Service: Batch, Status Code: 400, Request ID: 07f6a0b1-5003-4004-88c7-3ae2c317cb24)" (RequestToken: 3e651d0a-82c0-ed7b-6c77-01a111440adf, HandlerErrorCode: InvalidRequest)
    at FullCloudFormationDeployment.monitorDeployment (/usr/local/lib/node_modules/aws-cdk/lib/index.js:446:10568)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/usr/local/lib/node_modules/aws-cdk/lib/index.js:449:199716)
    at async /usr/local/lib/node_modules/aws-cdk/lib/index.js:449:181438

 ❌ Deployment failed: Error: The stack named CdktestStack failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 07f6a0b1-5003-4004-88c7-3ae2c317cb24 (Service: Batch, Status Code: 400, Request ID: 07f6a0b1-5003-4004-88c7-3ae2c317cb24)" (RequestToken: 3e651d0a-82c0-ed7b-6c77-01a111440adf, HandlerErrorCode: InvalidRequest)
    at FullCloudFormationDeployment.monitorDeployment (/usr/local/lib/node_modules/aws-cdk/lib/index.js:446:10568)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/usr/local/lib/node_modules/aws-cdk/lib/index.js:449:199716)
    at async /usr/local/lib/node_modules/aws-cdk/lib/index.js:449:181438

The stack named CdktestStack failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "Error executing request, Exception : arm-based instance type cannot be used with other instance types., RequestId: 07f6a0b1-5003-4004-88c7-3ae2c317cb24 (Service: Batch, Status Code: 400, Request ID: 07f6a0b1-5003-4004-88c7-3ae2c317cb24)" (RequestToken: 3e651d0a-82c0-ed7b-6c77-01a111440adf, HandlerErrorCode: InvalidRequest)

Here is an excerpt of template generated by CDK synthesis:

...
 MyECSComputeEnvironment6A03089C:
    Type: AWS::Batch::ComputeEnvironment
    Properties:
      ComputeResources:
        AllocationStrategy: BEST_FIT_PROGRESSIVE
        InstanceRole:
          Fn::GetAtt:
            - MyECSComputeEnvironmentInstanceProfile9F922264
            - Arn
        InstanceTypes:
          - m6g
          - optimal
        MaxvCpus: 8
        MinvCpus: 0
        SecurityGroupIds:
          - Fn::GetAtt:
              - MyECSComputeEnvironmentSecurityGroup7C63B7FD
              - GroupId
        Subnets:
          - subnet-045c5a5af92ce5bf5
          - subnet-0552f61c30c94db58
          - subnet-0b187fc322e757ab0
        Type: EC2
      ReplaceComputeEnvironment: false
      State: ENABLED
      Type: managed
      UpdatePolicy: {}
    Metadata:
      aws:cdk:path: CdktestStack/MyECSComputeEnvironment/Resource
...

The reason optimal is added to InstanceTypes is due the fact that useOptimalInstanceClasses property is by default true (refer code here which inspects useOptimalInstanceClasses property). Please explicitly set useOptimalInstanceClasses to false as shown below:

const vpc = ec2.Vpc.fromLookup(this, 'myVpc', { isDefault: true });
const computeEnvironment = new batch.ManagedEc2EcsComputeEnvironment(this, 'MyECSComputeEnvironment', {
  vpc,
  minvCpus: 0,
  maxvCpus: 8,
  instanceClasses: [ec2.InstanceClass.M6G],
  useOptimalInstanceClasses: false
});

I'm unsure if it would be feasibly to check for instance type based on used instance class since it is an enum which gets updated as new instance classes are supported by EC2 service.

Thanks, Ashish

pahud commented 3 weeks ago

We probably could improve:

  1. if instanceClasses is defined, x86 and arm instance types can't be mixed. We have a similar function in aws-eks and we technically could check the consistency.
  2. if instanceClasses is defined as all arm instance types, useOptimalInstanceClasses has to be false if undefined and can't be set explicitly true.
  3. we could implement the checks in renderInstances()

https://github.com/aws/aws-cdk/blob/975df1f5a17e9a2ff2b34223c84dd43e72057de2/packages/aws-cdk-lib/aws-batch/lib/managed-compute-environment.ts#L1134-L1148