aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.5k stars 3.84k forks source link

ecs: service.serviceConnectConfiguration generates bogus CloudMap namespaces #25616

Closed franck102 closed 1 year ago

franck102 commented 1 year ago

Describe the bug

This cdk code:

this.cluster = new ecs.Cluster(this, "MyCluster", {
            clusterName: cdk.PhysicalName.GENERATE_IF_NEEDED,
            vpc: this.vpc,
            defaultCloudMapNamespace: {
                name: "passmate.private",
                useForServiceConnect: true,
                type: cloudmap.NamespaceType.DNS_PRIVATE,
                vpc: this.vpc
            },
        });

... and in a different stack:

keycloakContainer.addPortMappings({
            name: "keycloak_http",
            containerPort: this.webPort});
...
this.service = new ecs.FargateService(this, 'Service', {
            cluster: props.cluster,
           ...
            serviceConnectConfiguration: {
                services: [{
                    portMappingName: "keycloak_http",
                    dnsName: "keycloak",
                    port: _keycloakPublicPort,
                }]
            }
        });

generates two namespaces in CloudMap:

Domain name       | Description | Instance discovery                           | Namespace ID
passmate.private | -                   | API calls and DNS queries in VPCs | ns-jfyvbvl33oocytvv
passmate.private | -                   | API calls                                             | ns-pge2n6bdb7z4k72x

The "DNS Queries" contains no services, the second namespace (API calls only) contains the keycloak service defined above.

Expected Behavior

I expected a private DNS entry to be created for the keycloak service, with the DNS name keycloak.passmate.private.

Current Behavior

No DNS record gets created.

The first namespace ("API calls and DNS queries in VPCs") has the expected aws:cloudformation:stack-id, aws:cloudformation:stack-name and aws:cloudformation:logical-id tags.

The second namespace ("API calls" only) which contains the service has a single "AmazonECSManaged" tag.

Reproduction Steps

Create a simple stack with cluster with a default CloudMap namespace & a ServiceConnect-enabled service as described above.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.79.1 (build 2e7f8b7)

Framework Version

"version": "2.79.0",

Node.js Version

v18.3.0

OS

macOS 12.6

Language

Typescript

Language Version

5.0.4

Other information

No response

pahud commented 1 year ago

Hi

I tried this in my environment and looks like it only create one namespace?

export class Demo2Stack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    const vpc = getOrCreateVpc(this);
    const cluster = new ecs.Cluster(this, "MyCluster", {
      clusterName: PhysicalName.GENERATE_IF_NEEDED,
      vpc,
      defaultCloudMapNamespace: {
          name: "passmate.private",
          useForServiceConnect: true,
          type: servicediscovery.NamespaceType.DNS_PRIVATE,
          vpc
      },
    });
    const task = new ecs.FargateTaskDefinition(this, 'Task', {
      cpu: 256,
      memoryLimitMiB: 512,
    });
    task.addContainer('keycloak', {
      image: ecs.ContainerImage.fromRegistry('quay.io/keycloak/keycloak'),
      portMappings: [
        { name: 'keycloak_http', containerPort: 8080 },
      ]
    });
    new ecs.FargateService(this, 'Service', {
      cluster,
      taskDefinition: task,
      serviceConnectConfiguration: {
          services: [{
              portMappingName: "keycloak_http",
              dnsName: "keycloak",
              port: 80,
          }]
      }
    });
  };
};

Resources [+] AWS::ECS::Cluster MyCluster MyCluster4C1BA579 [+] AWS::ServiceDiscovery::PrivateDnsNamespace MyCluster/DefaultServiceDiscoveryNamespace MyClusterDefaultServiceDiscoveryNamespace4A9016C4 [+] AWS::IAM::Role Task/TaskRole TaskTaskRoleE98524A1 [+] AWS::ECS::TaskDefinition Task Task79114B6B [+] AWS::ECS::Service Service/Service ServiceD69D759B [+] AWS::EC2::SecurityGroup Service/SecurityGroup ServiceSecurityGroupC96ED6A7

pahud commented 1 year ago

BTW, if you are building keycloak service on ECS or Fargate, you probably can refer to this aws-sample which does not use cloudmap but JDBC_PING and stickinessCookie instead. It's not perfect but worth a look.

https://github.com/aws-samples/cdk-keycloak/blob/main/src/keycloak.ts

franck102 commented 1 year ago

I suspect the issue arises because the cluster and the service are defined in different stacks (see the Fn::ImportValue below)? Everything indeed looks correct in the generated template - a single private dns namespace in the cluster, and service refers to it by name. However at deploy time the service configuration causes AWS to instantiate a new namespace instead of reusing the cluster's default one.

Cluster template:

 PassmateCluster89C7783C:
    Type: AWS::ECS::Cluster
    Properties:
      ServiceConnectDefaults:
        Namespace: passmate.private
    Metadata:
      aws:cdk:path: PassmateStack/Infra/PassmateCluster/Resource
  PassmateClusterDefaultServiceDiscoveryNamespace9D590D92:
    Type: AWS::ServiceDiscovery::PrivateDnsNamespace
    Properties:
      Name: passmate.private
      Vpc:
        Ref: PassmateVpcE47AF4D1
    Metadata:
      aws:cdk:path: PassmateStack/Infra/PassmateCluster/DefaultServiceDiscoveryNamespace/Resource

Service template:

KeycloakClusterServiceDC58B5EE:
    Type: AWS::ECS::Service
    Properties:
      Cluster:
        Fn::ImportValue: PassmateStackInfraECB4C246:ExportsOutputRefPassmateCluster89C7783C280B02BA
    ...
      NetworkConfiguration:
        AwsvpcConfiguration:
         ...
      ServiceConnectConfiguration:
        Enabled: true
        Namespace: passmate.private
        Services:
          - ClientAliases:
              - DnsName: keycloak
                Port: 8088
                PortName: keycloak_http
        TaskDefinition:
          Ref: KeycloakClusterTaskDefinitionE5269269
franck102 commented 1 year ago

BTW, if you are building keycloak service on ECS or Fargate, you probably can refer to this aws-sample which does not use cloudmap but JDBC_PING and stickinessCookie instead. It's not perfect but worth a look.

https://github.com/aws-samples/cdk-keycloak/blob/main/src/keycloak.ts

I am using DNS_PING for the cluster itself, I need Cloudmap for Keycloak integration with by backend services (Quarkus admin client). And I am trying to avoid the cost of an internal load balancer...

pahud commented 1 year ago

I think you probably should configure it like this(using nginx as a example):

    const vpc = getOrCreateVpc(this);
    const cluster = new ecs.Cluster(this, `MyCluster${id}`, {
      vpc,
      defaultCloudMapNamespace: {
          name: `${id}-ns`,
          useForServiceConnect: true,
          type: servicediscovery.NamespaceType.DNS_PRIVATE,
          vpc
      },
    });
    const task = new ecs.FargateTaskDefinition(this, 'Task', {
      cpu: 256,
      memoryLimitMiB: 512,
    });
    task.addContainer('nginx', {
      image: ecs.ContainerImage.fromRegistry('nginx'),
      portMappings: [
        { name: 'nginx_http', containerPort: 80 },
      ],
    });
    const service = new ecs.FargateService(this, 'Service', {
      cluster,
      taskDefinition: task,
      serviceConnectConfiguration: {
          services: [{
              portMappingName: "nginx_http",
              dnsName: "nginx",
              port: 80,
          }],
      },
    });
    service.node.addDependency(cluster.defaultCloudMapNamespace!);

defaultCloudMapNamespace creates the default CloudMap Namespace and serviceConnectConfiguration by default uses the default namespace if name is undefined. The addDependency() ensures their dependency.

Let me know if it works for you.

pahud commented 1 year ago

OK this looks like a random bug.

I deployed 10 times and 5 out of 10 I got duplicate namespaces. This doesn't seem to be a CDK bug but probably ECS or cloudmap. I've reported internally. V907831514

image
franck102 commented 1 year ago

Any workaround you can suggest?

If I use the CloudMap APIs directly to create the keycloak record, do I need to additionally configure something on the service to enable ServiceConnect?

franck102 commented 1 year ago

The duplicate namespace is in fact created by the Infra stack. If I manually delete the spurious "API only" namespace before deploying the service stack, the service deployment fails with this errror:

8:24:56 AM | CREATE_FAILED        | AWS::ECS::Service              | PassmateStack/Keyc...er/Service/Service
Resource handler returned message: "Failed to retrieve namespace for passmate.private (Service: AmazonECS; Status Code: 400; Error Code: ClientException; Request ID: f10b9440-e002-4254-809b-2fcd
913988d9; Proxy: null)" (RequestToken: d9130384-16a7-9dac-dd6a-c83475c10a3d, HandlerErrorCode: GeneralServiceException)
8:24:57 AM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack     | PassmateStackKeycloak9267A274
The following resource(s) failed to create: [KeycloakClusterTaskDefinitionExecutionRoleDefaultPolicyABB500FF, KeycloakClusterServiceDC58B5EE]. Rollback requested by user.
8:24:57 AM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack     | PassmateStackKeycloak9267A274

I can see the namespace in the console, its resource name though is "passmate.private-fm7n2lxfyexstqq5", not passmate.private. Could it be that cluster.defaultCloudMapNamespace?.namespaceName, which currently contains "passmate.private", should instead contain the resource name "passmate.private-fm7n2lxfyexstqq5" ?

franck102 commented 1 year ago

Yep, that seems to be the issue, the service deployment works as expected if I configure service connect by manually specifying the namespace resource name:

this.service.enableServiceConnect({
            namespace: "passmate.private-fm7n2lxfyexstqq5",
            services: [{
                portMappingName: "keycloak_http",
                dnsName: "keycloak",
                port: _keycloakPublicPort,
            }]
        })

Is there any way to obtain that resource name from cdk after deploying the Infra stack but before deploying the service stack?

franck102 commented 1 year ago

The console is a bit of a mess when it comes to namespaces? For the same namespace, I have this in [Amazon Elastic Container Service] -> [Namespaces]:

ARN: arn:aws:servicediscovery:us-east-1:782843724418:namespace/ns-kpfljha2hj2yic3m
Name: passmate.private-fm7n2lxfyexstqq5 

... but in [AWS Cloud Map] -> [Namespaces] I have this instead:

Name:  passmate.private
Namespace ID:  ns-kpfljha2hj2yic3m
HTTP name: passmate.private-fm7n2lxfyexstqq5

So the ECS name becomes the CloudMap HTTP Name, would this be the origin of the bug?

I can't figure out how to lookup the actual namespace from the service stack even though I have its ARN, because PrivateDnsNamespace.fromPrivateDnsNamespaceAttributes wants arn, id and name all supplied.

franck102 commented 1 year ago

Using the ARN to setup the service ServiceConnect options works:

const namespace = this.cluster.addDefaultCloudMapNamespace({
            ...
        });
        this.namespaceArn = namespace.namespaceArn;

then

this.service.enableServiceConnect({
            namespace: props.namespaceArn,
            services: [{
                portMappingName: "keycloak_http",
                dnsName: "keycloak",
                port: _keycloakPublicPort,
            }]
        })

This was painful :(

franck102 commented 1 year ago

It looks like I may have to give up on ServiceConnect... with the steps above the PrivateDnsNamespace has the service, but it doesn't have any DNS record for it. Some documentation and/or examples would really help...

omrta-dev commented 1 year ago

I would like to point out that @franck102 is not the only one facing this issue:

image

As shown in the image I am also getting two namespaces generated while using EC2 Services with very similar timestamps

My CDK code is below: Cluster:

      defaultCloudMapNamespace: {
        type: NamespaceType.DNS_PUBLIC,
        name: this.domainZone.zoneName,
        useForServiceConnect: true,
      },

ApplicationLoadBalancedEc2Service Construct:

new ApplicationLoadBalancedEc2Service(
      this,
      'test'
      {
        cluster: this.cluster,
        serviceName: nginxPM,
        sslPolicy: SslPolicy.RECOMMENDED_TLS,
        certificate: this.certificate,
        domainName: `nginx.${this.domainZone.zoneName}`,
        domainZone: this.domainZone,
        cloudMapOptions: {
          cloudMapNamespace: this.cluster.defaultCloudMapNamespace,
          name: "nginx-dns",
        },
        targetProtocol: ApplicationProtocol.HTTP,
        taskDefinition: nginx.taskDefinition,
        memoryLimitMiB: 256,
        loadBalancer: this.alb,
        publicLoadBalancer: true,
        redirectHTTP: true,
      }
    );

Ec2Service:

new Ec2Service(
      this.serviceProps.scope,
      `${this.serviceProps.name}Service`,
      {
        cluster: this.serviceProps.cluster,
        taskDefinition: this.getTaskDefinition(),
        cloudMapOptions: {
          cloudMapNamespace: this.serviceProps.cluster.defaultCloudMapNamespace,
          name: `${this.serviceProps.name.toLowerCase()}-dns`,
        },
        enableExecuteCommand: true,
        serviceName: this.serviceProps.name,
      }
    );

Note that I first create the cluster, then use cluster.defaultCloudMapNamespace everywhere, so it should only have created 1 namespace and used 1 namespace, but for some reason I get both a Api Calls + Public DNS and API Calls namespaces.

I've checked the Cloud Formation Template and it also only shows that 1 namespace is getting created at the cft level:

ClusterDefaultServiceDiscoveryNamespace26741755:
    Type: AWS::ServiceDiscovery::PublicDnsNamespace
    Properties:
      Name: <xxxxx>dev.com

The API Call only namespace has the AmazonECSManaged = True Tag set while the other namespace(public DNS + api calls) has the actual records and services added to it.

I also +1 the need for documentation, I've spent many hours trying to figure out how service discovery, cloudmap, and service connect all fit into each other and how it all works with the CDK.

chessels commented 1 year ago

This question on StackOverflow explains a lot about how Service Connect works. Apparently Service Connect isn't meant to be used together with DNS.

Seiya6329 commented 1 year ago

Thanks for reporting a bug and we apologize for any inconvenience caused by an issue.

We are investigating this issue and will update this thread once we have a findings. ETA for next update: 6/7 (Wed)

bvtujo commented 1 year ago

This issue looks like it's caused by Service Connect's default behavior of creating a namespace with the given name if one does not exist.

To mitigate this issue, I've cut a PR (#25891) which implicitly creates a dependency between the default Cloudmap namespace and the cluster by using the namespace ARN instead of it's string id. This will avoid the problem at cluster creation.

It's still possible to create a namespace accidentally by passing the string name to enableServiceConnect() on an ecs.*Service construct. I'd recommend that in this case you pass the namespace by ARN (namespace.namespaceArn) to create a dependency, or add an explicit dependency between the namespace and the service. For example:

declare const ns cloudmap.INamespace; // example.com
declare const svc ecs.FargateService;

svc.node.addDependency(ns);
svc.enableServiceConnect({
  namespace: 'example.com',
});

A good next step to stop this from happening in the future altogether would be to deprecate the namespace: string field in favor of a new cloudMapNamespace: INamespace field, allowing us to resolve this issue completely within the construct.

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.