ecs: "The provided launch template does not expose its user data" when trying to add a second capacity provider #30742

Open rantoniuk opened 5 months ago

rantoniuk commented 5 months ago

Describe the bug

The code below works perfectly fine until the line ----- inf1, so with one gpuCapacityProvider. When trying to add additional inf1CP capacity provider, with a new LaunchTemplate that does not mention anything about UserData, it errors out on cdk diff with:

Error: The provided launch template does not expose its user data.
    at AutoScalingGroup.get userData [as userData] (infra/cdk/node_modules/aws-cdk-lib/aws-autoscaling/lib/auto-scaling-group.js:1:24056)
    at AutoScalingGroup.addUserData (infra/cdk/node_modules/aws-cdk-lib/aws-autoscaling/lib/auto-scaling-group.js:1:22335)
    at Cluster.configureAutoScalingGroup (infra/cdk/node_modules/aws-cdk-lib/aws-ecs/lib/cluster.js:1:11190)
    at Cluster.addAsgCapacityProvider (infra/cdk/node_modules/aws-cdk-lib/aws-ecs/lib/cluster.js:1:9915)
    at new EcsStack (infra/cdk/lib/ecs-stack.ts:130:18)
    at Object.<anonymous> (infra/cdk/bin/cdk.ts:35:13)
    at Module._compile (node:internal/modules/cjs/loader:1358:14)
    at Module.m._compile (infra/cdk/node_modules/ts-node/src/index.ts:1618:23)
    at Module._extensions..js (node:internal/modules/cjs/loader:1416:10)
    at Object.require.extensions.<computed> [as .ts] (infra/cdk/node_modules/ts-node/src/index.ts:1621:12)

Subprocess exited with error 1

which is specifically caused by this line:


import { Stack, StackProps } from 'aws-cdk-lib';
import { AutoScalingGroup, IAutoScalingGroup } from 'aws-cdk-lib/aws-autoscaling';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import { AsgCapacityProvider, Cluster } from 'aws-cdk-lib/aws-ecs';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';
import { IEnvironmentConfig } from './helpers/environment-config';

interface EcsStackProps extends StackProps {
  envv: IEnvironmentConfig;
  vpc: ec2.Vpc;

export class EcsStack extends Stack {
  readonly cluster: Cluster;
  readonly execRole: iam.IRole;
  readonly gpuAutoScalingGroup: IAutoScalingGroup;

  constructor(scope: Construct, id: string, props: EcsStackProps) {
    super(scope, id, props);

    this.cluster = new Cluster(this, 'EcsCluster', {
      clusterName: 'EcsCluster',
      vpc: props.vpc,

    // Ec2 Security Group
    const gpuinstanceSecurityGroup = new ec2.SecurityGroup(this, 'EcsGpuInstanceSg', {
      securityGroupName: 'EcsGpuInstanceSg',
      description: ' security group for gpu instances for ecs tasks',
      vpc: props.vpc,

    // EC2 Execution Role with access to ECS actions
    const ltRole = new iam.Role(this, 'EcsClusterRole', {
      roleName: 'ecs-cluster-role',
      assumedBy: new iam.ServicePrincipal(''),
      managedPolicies: [

    const rootVolume: ec2.BlockDevice = {
      deviceName: '/dev/xvda',
      volume: ec2.BlockDeviceVolume.ebs(100),

    // set GPU as the default for Docker
    const userData = ec2.UserData.forLinux();
      'sudo rm /etc/sysconfig/docker',
      'echo DAEMON_MAXFILES=1048576 | sudo tee -a /etc/sysconfig/docker',
      'echo OPTIONS="--default-ulimit nofile=32768:65536 --default-runtime nvidia" | sudo tee -a /etc/sysconfig/docker',
      'echo DAEMON_PIDFILE_TIMEOUT=10 | sudo tee -a /etc/sysconfig/docker',
      'sudo systemctl restart docker',

    // GPU EC2 Launch Template
    const launchTemplate = new ec2.LaunchTemplate(this, 'EcsClusterLt', {
      launchTemplateName: 'ecs-gpu-lt',
      machineImage: ec2.MachineImage.genericLinux({
        // ecs optimised image with gpu support
        'us-west-2': 'ami-027492973b111510a',
      instanceType: new ec2.InstanceType('g4dn.xlarge'),
      role: ltRole,
      userData: userData,
      securityGroup: gpuinstanceSecurityGroup,
      blockDevices: [rootVolume],
      requireImdsv2: true,

    // Add GPU autoscaling capacity provider to the cluster
    const gpuAutoScalingGroup = new AutoScalingGroup(this, 'EcsGpuASG', {
      autoScalingGroupName: 'EcsGpuASG',
      vpc: props.vpc,
      minCapacity: 0,
      maxCapacity: 1,

    //Add the capacity to the cluster
    const gpuCapacityProvider = new AsgCapacityProvider(this, 'EcsGpuCapacityProvider', {
      autoScalingGroup: gpuAutoScalingGroup,
      capacityProviderName: 'gpuCapacityProvider',


      name: 'local',
      useForServiceConnect: true,

    // ---------------- inf1

    // GPU EC2 Launch Template
    const launchTemplateInf1 = new ec2.LaunchTemplate(this, 'EcsClusterInf1', {
      machineImage: ec2.MachineImage.genericLinux({
        // aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2023/neuron/recommended
        'us-west-2': 'ami-00a3a4671e9889e76',
      instanceType: new ec2.InstanceType('inf1.2xlarge'),
      role: ltRole,
      securityGroup: gpuinstanceSecurityGroup,
      // blockDevices: [rootVolume],
      requireImdsv2: true,

    const inf1ASG = new AutoScalingGroup(this, 'EcsInf1ASG', {
      autoScalingGroupName: 'EcsInf1ASG',
      vpc: props.vpc,
      launchTemplate: launchTemplateInf1,
      minCapacity: 0,
      maxCapacity: 1,

    //Add the capacity to the cluster
    const inf1CP = new AsgCapacityProvider(this, 'EcsInf1CapacityProvider', {
      autoScalingGroup: inf1ASG,
      capacityProviderName: 'Inf1AsgCapacityProvider',


      { capacityProvider: gpuCapacityProvider.capacityProviderName, weight: 1 },
      { capacityProvider: inf1CP.capacityProviderName, weight: 0 },


CDK CLI Version

2.146.0 (build b368c78)

Language Version

"typescript": "~5.2.0"

ashishdhingra commented 5 months ago

@rantoniuk Good afternoon. Thanks for opening the issue. The error is perhaps thrown here. Please refer to section Clusters in Amazon ECS Construct Library README. It mentions that To use LaunchTemplate with AsgCapacityProvider, make sure to specify the userData in the LaunchTemplate. Does the error goes away once you explicitly specify userData in 2nd LaunchTemplate (as you did in the 1st LaunchTemplate)?

We also have an open issue to improve error messaging in case user data is missing from launch template, however, don't have ETA as of now.

Thanks, Ashish

We also have an open issue to improve error messaging in case user data is missing from launch template, however, don't have ETA as of now.

Thanks, Ashish

pahud commented 5 months ago


If you look at the stack trace, it fails at this method:


message: The provided launch template does not expose its user data.

And if you check here:

If launchTemplate is provided, it has to have userData attribute.

Looking at your launchTemplateInf1 obviously it's missing the userData:

const launchTemplateInf1 = new ec2.LaunchTemplate(this, 'EcsClusterInf1', {
      machineImage: ec2.MachineImage.genericLinux({
        // aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2023/neuron/recommended
        'us-west-2': 'ami-00a3a4671e9889e76',
      instanceType: new ec2.InstanceType('inf1.2xlarge'),
      role: ltRole,
      securityGroup: gpuinstanceSecurityGroup,
      // blockDevices: [rootVolume],
      requireImdsv2: true,
rantoniuk commented 5 months ago

Yes, I confirm that fixes the issue:

 const userDataInf1= ec2.UserData.forLinux();

    // GPU EC2 Launch Template
    const launchTemplateInf1 = new ec2.LaunchTemplate(this, 'EcsClusterInf1', {
      instanceType: new ec2.InstanceType('inf1.2xlarge'),
      role: ltRole,
      userData: userDataInf1,
      securityGroup: gpuinstanceSecurityGroup,
      // blockDevices: [rootVolume],
      requireImdsv2: true,

However let me ask a follow-up questions then:

  1. Is this a Cloudformation requirement or CDK requirement? If the latter, then I would say that instead of README, CDK should automatically add ec2.UserData.forLinux() unless otherwise defined.

  2. Unrelated to the initial issue, but when I tried to use:

    machineImage: ec2.MachineImage.genericLinux({

    then Cloudformation complained that it can't find imageId. I had to use an undocumented suffix, so '/aws/service/ecs/optimized-ami/amazon-linux-2023/neuron/recommended/image_id' - maybe something to be added to the documentation directly.