aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.7k stars 3.93k forks source link

(opensearchservice): must configure zone awareness settings even when i am not enabling zone awareness #29346

Open orshemtov opened 8 months ago

orshemtov commented 8 months ago

Describe the bug

When creating an opensearch domain using AWS CDK, I am getting the following error:

Invalid request provided: You must configure zone awareness settings if you turn on zone awareness

My CDK code is as follows:

import * as cdk from "aws-cdk-lib";
import * as ec2 from "aws-cdk-lib/aws-ec2";
import * as opensearch from "aws-cdk-lib/aws-opensearchservice";
import { Construct } from "constructs";

export interface OpenSearchProps {
  vpc: ec2.IVpc;
  subnets: ec2.ISubnet[];
}

export class OpenSearch extends Construct {
  constructor(scope: Construct, id: string, props: OpenSearchProps) {
    super(scope, id);

    ...

    const opensearchDomain = new opensearch.Domain(this, "Domain", {
      vpc,
      vpcSubnets: [
        {
          subnets,
        },
      ],
      securityGroups: [securityGroup],
      version: opensearch.EngineVersion.OPENSEARCH_2_5,
      tlsSecurityPolicy: opensearch.TLSSecurityPolicy.TLS_1_2,
      enableVersionUpgrade: true,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      zoneAwareness: {
        enabled: false,
      },
      capacity: {
        dataNodeInstanceType: "t3.small.search",
        dataNodes: 1,
      },
    });
  }
}

I've also tried with omitting zoneAwareness altogether

Expected Behavior

The cluster should be deployed in a single AZ

Current Behavior

There is an error stating that zone awareness must be configured, even tho my zone awareness is set to false

Reproduction Steps

Create a CDK stack

export class MyStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

     // VPC
    const vpcId = this.node.tryGetContext("vpcId");
    const vpc = ec2.Vpc.fromLookup(this, "Vpc", { vpcId });
    const subnets = vpc.privateSubnets;
    const subnetIds = vpc.privateSubnets.map((subnet) => subnet.subnetId);

    // OpenSearch
    const opensearch = new OpenSearch(this, "OpenSearch", {
      vpc,
      subnets,
    });

  }
}
import * as cdk from "aws-cdk-lib";
import * as ec2 from "aws-cdk-lib/aws-ec2";
import * as opensearch from "aws-cdk-lib/aws-opensearchservice";
import { Construct } from "constructs";

export interface OpenSearchProps {
  vpc: ec2.IVpc;
  subnets: ec2.ISubnet[];
}

export class OpenSearch extends Construct {
  constructor(scope: Construct, id: string, props: OpenSearchProps) {
    super(scope, id);

    const { vpc, subnets } = props;

    const securityGroup = new ec2.SecurityGroup(this, "SecurityGroup", {
      vpc,
      allowAllOutbound: true,
    });

    securityGroup.addIngressRule(
      ec2.Peer.ipv4(vpc.vpcCidrBlock),
      ec2.Port.tcp(9200)
    );

    securityGroup.addIngressRule(
      ec2.Peer.ipv4(vpc.vpcCidrBlock),
      ec2.Port.tcp(9300)
    );

    const opensearchDomain = new opensearch.Domain(this, "Domain", {
      vpc,
      vpcSubnets: [
        {
          subnets,
        },
      ],
      securityGroups: [securityGroup],
      version: opensearch.EngineVersion.OPENSEARCH_2_5,
      tlsSecurityPolicy: opensearch.TLSSecurityPolicy.TLS_1_2,
      enableVersionUpgrade: true,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      zoneAwareness: {
        enabled: false,
      },
      capacity: {
        dataNodeInstanceType: "t3.small.search",
        dataNodes: 1,
      },
    });

    new cdk.CfnOutput(this, "Endpoint", {
      value: opensearchDomain.domainEndpoint,
    });
  }
}

Possible Solution

Downgrading to CDK version 2.85.0 seems to fix the problem

Additional Information/Context

No response

CDK CLI Version

2.117.0 (build 59d9b23)

Framework Version

2.127.0

Node.js Version

v21.6.2

OS

macos

Language

TypeScript

Language Version

5.3.3

Other information

No response

msambol commented 8 months ago

I was able to reproduce this and confirmed the template contains the following:

"ZoneAwarenessEnabled": false

This feels like a CFN bug, as the error is from CFN not the CDK. cc/ @pahud

pahud commented 8 months ago

I see this from the doc:

If you specify more than one subnet, you must also configure ZoneAwarenessEnabled and ZoneAwarenessConfig within ClusterConfig, otherwise you'll see the error "You must specify exactly one subnet" during template creation.

And I got this error when I deploy across 3 subnets/AZs

12:22:07 PM | CREATE_FAILED | AWS::OpenSearchService::Domain | Domain66AC69E0 Resource handler returned message: "Invalid request provided: You must specify exactly one subnet. (Service: OpenSearch, Status Code: 400, Request ID: ebf77162-b821-49c2-b061-bc635d708913)" (RequestToken: dcb066e2-658e-35bb-d913-d0b3640afe9b, HandlerErrorCode: InvalidRequest)

Looks like when ZoneAwarenessEnabled is disabled, only one subnet for the domain is allowed. However, if we specify the vpcSubnets like this, multiple subnets would be selected:

    vpcSubnets: [
        { subnetType: SubnetType.PRIVATE_WITH_EGRESS },
      ],

My workaround is:

export class DummyStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    const vpc = getDefaultVpc(this);
    const opensearchDomain = new opensearch.Domain(this, "Domain", {
      vpc,
      vpcSubnets: [
        { subnetType: SubnetType.PRIVATE_WITH_EGRESS },
      ],
      version: opensearch.EngineVersion.OPENSEARCH_2_5,
      tlsSecurityPolicy: opensearch.TLSSecurityPolicy.TLS_1_2,
      enableVersionUpgrade: true,
      removalPolicy: RemovalPolicy.DESTROY,
      zoneAwareness: {
        enabled: false,
      },
      capacity: {
        dataNodeInstanceType: "t3.small.search",
        dataNodes: 1,
      },
    });

    const cfndomain = opensearchDomain.node.tryFindChild('Resource') as opensearch.CfnDomain

    const selectedSubnetIds = vpc.selectSubnets({ subnetType: SubnetType.PRIVATE_WITH_EGRESS }).subnetIds
    cfndomain.addPropertyOverride('VPCOptions.SubnetIds', [ selectedSubnetIds[0] ] )

  }
}

I will create an internal ticket to clarify if only 1 subnet is allowed when zoneAawreness is disabled. Meanwhile, can you share your use case that you need multiple AZs with zoneAawreness disabled?

orshemtov commented 8 months ago

My use case is that I was actually trying to create a domain with no multiple AZs, only a single AZ

When I try to do that, I receive the above error

Invalid request provided: You must configure zone awareness settings if you turn on zone awareness

At no point did I enable zone awareness explicitly, in fact I've explicitly turned it off.

Could be because I'm supplying multiple subnets tho, as mentioned above (supplying the private ones)

But I also tried doing

vpcSubnets: [
        {
          subnets: [subnets[0]],
        },
      ],

And still got the same error

pahud commented 8 months ago

@orshemtov check out my workaround in my last comment

pahud commented 8 months ago

internal tracking: V1282499345

orshemtov commented 8 months ago

@pahud So I've changed my CDK construct to what you've suggested:

import * as cdk from "aws-cdk-lib";
import * as ec2 from "aws-cdk-lib/aws-ec2";
import * as opensearch from "aws-cdk-lib/aws-opensearchservice";
import { Construct } from "constructs";

export interface OpenSearchProps {
  vpc: ec2.IVpc;
  subnets: ec2.ISubnet[];
}

export class OpenSearch extends Construct {
  constructor(scope: Construct, id: string, props: OpenSearchProps) {
    super(scope, id);

    const { vpc, subnets } = props;

    const securityGroup = new ec2.SecurityGroup(this, "SecurityGroup", {
      vpc,
      allowAllOutbound: true,
    });

    securityGroup.addIngressRule(
      ec2.Peer.ipv4(vpc.vpcCidrBlock),
      ec2.Port.tcp(9200)
    );

    securityGroup.addIngressRule(
      ec2.Peer.ipv4(vpc.vpcCidrBlock),
      ec2.Port.tcp(9300)
    );

    const opensearchDomain = new opensearch.Domain(this, "Domain", {
      vpc,
      vpcSubnets: [
        {
          subnets,
        },
      ],
      securityGroups: [securityGroup],
      version: opensearch.EngineVersion.OPENSEARCH_2_5,
      tlsSecurityPolicy: opensearch.TLSSecurityPolicy.TLS_1_2,
      enableVersionUpgrade: true,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      capacity: {
        dataNodeInstanceType: "t3.small.search",
        dataNodes: 1,
      },
    });

    const cfnDomain = opensearchDomain.node
      .defaultChild as opensearch.CfnDomain;
    const selectedSubnetIds = vpc.selectSubnets({
      subnets,
    }).subnetIds;
    cfnDomain.addPropertyOverride("VPCOptions.SubnetIds", [
      selectedSubnetIds[0],
    ]);

    new cdk.CfnOutput(this, "Endpoint", {
      value: opensearchDomain.domainEndpoint,
    });
  }
}

And I'm still getting the same error after doing cdk deploy

7:57:11 PM | CREATE_FAILED        | AWS::OpenSearchService::Domain              | OpenSea
rch/Domain
Resource handler returned message: "Invalid request provided: You must configure zone awa
reness settings if you turn on zone awareness. (Service: OpenSearch, Status Code: 400, Re
quest ID: ff6cbc56-0f51-415c-be42-080583f952e8)" (RequestToken: 241b96ab-035d-bda4-ddaf-7
0b348e25eba, HandlerErrorCode: InvalidRequest)
pahud commented 8 months ago

And I'm still getting the same error after doing cdk deploy

I didn't see you turn on zone awareness from your code snippet above and it should be "ZoneAwarenessEnabled": false

Can you check your cdk synth and verify that?

pahud commented 8 months ago

I am trying to deploy this now.

export class DummyStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    const vpc = getDefaultVpc(this);
    const opensearchDomain = new opensearch.Domain(this, "Domain", {
      vpc,
      vpcSubnets: [
        { subnetType: SubnetType.PRIVATE_WITH_EGRESS },
      ],
      version: opensearch.EngineVersion.OPENSEARCH_2_5,
      tlsSecurityPolicy: opensearch.TLSSecurityPolicy.TLS_1_2,
      enableVersionUpgrade: true,
      removalPolicy: RemovalPolicy.DESTROY,
      // zoneAwareness: {
      //   enabled: false,
      // },
      capacity: {
        dataNodeInstanceType: "t3.small.search",
        dataNodes: 1,
      },
    });

    const cfndomain = opensearchDomain.node.tryFindChild('Resource') as opensearch.CfnDomain

    const selectedSubnetIds = vpc.selectSubnets({ subnetType: SubnetType.PRIVATE_WITH_EGRESS }).subnetIds
    cfndomain.addPropertyOverride('VPCOptions.SubnetIds', [ selectedSubnetIds[0] ] )
  }
}

And the synth like

 "Domain66AC69E0": {
   "Type": "AWS::OpenSearchService::Domain",
   "Properties": {
    "ClusterConfig": {
     "DedicatedMasterEnabled": false,
     "InstanceCount": 1,
     "InstanceType": "t3.small.search",
     "ZoneAwarenessEnabled": false
    },
    "DomainEndpointOptions": {
     "EnforceHTTPS": false,
     "TLSSecurityPolicy": "Policy-Min-TLS-1-2-2019-07"
    },
    "EBSOptions": {
     "EBSEnabled": true,
     "VolumeSize": 10,
     "VolumeType": "gp2"
    },
    "EncryptionAtRestOptions": {
     "Enabled": false
    },
    "EngineVersion": "OpenSearch_2.5",
    "LogPublishingOptions": {},
    "NodeToNodeEncryptionOptions": {
     "Enabled": false
    },
    "VPCOptions": {
     "SecurityGroupIds": [
      {
       "Fn::GetAtt": [
        "DomainSecurityGroup48AA5FD6",
        "GroupId"
       ]
      }
     ],
     "SubnetIds": [
      "subnet-071c85610846aa9c0"
     ]
    }
   },
image

It could take a while but I didn't see any error for now.

orshemtov commented 8 months ago

im getting this synth

OpenSearchDomain099259C2:
    Type: AWS::OpenSearchService::Domain
    Properties:
      ClusterConfig:
        DedicatedMasterEnabled: false
        InstanceCount: 1
        InstanceType: t3.small.search
        MultiAZWithStandbyEnabled: true
        ZoneAwarenessEnabled: false
      DomainEndpointOptions:
        EnforceHTTPS: false
        TLSSecurityPolicy: Policy-Min-TLS-1-2-2019-07
      EBSOptions:
        EBSEnabled: true
        VolumeSize: 10
        VolumeType: gp2
      EncryptionAtRestOptions:
        Enabled: false
      EngineVersion: OpenSearch_2.5
      LogPublishingOptions: {}
      NodeToNodeEncryptionOptions:
        Enabled: false
      Tags:
        - Key: app
          Value: vita-llms
        - Key: env
          Value: dev
      VPCOptions:
        SecurityGroupIds:
          - Fn::GetAtt:
              - OpenSearchSecurityGroup70E5053B
              - GroupId
        SubnetIds:
          - subnet-01da729e6394035cc
    UpdatePolicy:
      EnableVersionUpgrade: true
    UpdateReplacePolicy: Delete
    DeletionPolicy: Delete
    Metadata:
      aws:cdk:path: VitaLlmsStack/OpenSearch/Domain/Resource

original cdk code

  const opensearchDomain = new opensearch.Domain(this, "Domain", {
      vpc,
      vpcSubnets: [
        {
          subnets,
        },
      ],
      securityGroups: [securityGroup],
      version: opensearch.EngineVersion.OPENSEARCH_2_5,
      tlsSecurityPolicy: opensearch.TLSSecurityPolicy.TLS_1_2,
      enableVersionUpgrade: true,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      capacity: {
        dataNodeInstanceType: "t3.small.search",
        dataNodes: 1,
      },
    });

    const cfnDomain = opensearchDomain.node
      .defaultChild as opensearch.CfnDomain;

    const selectedSubnetIds = vpc.selectSubnets({
      subnets,
    }).subnetIds;

    cfnDomain.addPropertyOverride("VPCOptions.SubnetIds", [
      selectedSubnetIds[0],
    ]);

for some reason this comes out 'true', im not sure if thats intended:

MultiAZWithStandbyEnabled

orshemtov commented 8 months ago

it turns out this flag can be disabled by setting

capacity: {
        dataNodeInstanceType: "t3.small.search",
        dataNodes: 1,
        multiAzWithStandbyEnabled: false,
      },

tho, i think this is a bug because the doc for this flag states the default should be false:

/**
     * Indicates whether Multi-AZ with Standby deployment option is enabled.
     * For more information, see [Multi-AZ with Standby]
     * (https://docs.aws.amazon.com/opensearch-service/latest/developerguide/managedomains-multiaz.html#managedomains-za-standby)
     *
     * @default - no multi-az with standby
     */
    readonly multiAzWithStandbyEnabled?: boolean;

now my deployment still hasnt failed for a few minutes

image

orshemtov commented 8 months ago

final code that worked:

    const opensearchDomain = new opensearch.Domain(this, "Domain", {
      vpc,
      vpcSubnets: [
        {
          subnets: [subnets[0]],
        },
      ],
      securityGroups: [securityGroup],
      version: opensearch.EngineVersion.OPENSEARCH_2_5,
      tlsSecurityPolicy: opensearch.TLSSecurityPolicy.TLS_1_2,
      enableVersionUpgrade: true,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      capacity: {
        dataNodeInstanceType: "t3.small.search",
        dataNodes: 1,
        multiAzWithStandbyEnabled: false,
      },
    });
pahud commented 5 months ago

Thank you @orshemtov

levilugato commented 1 month ago

Thanks, I'm facing exactly same issue