data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
228 stars 82 forks source link

Handling Pre-existing Endpoints in AWS CDK Constructs Using data.all #1067

Open anandsumit2000 opened 6 months ago

anandsumit2000 commented 6 months ago

Overview:

The AWS CDK utilized in the data.all construct, creates the AWS VPC endpoints like S3, DynamoDB, and others. In some instances, the VPCs I work with may already have some of these endpoints, either manually configured or created independently.

Key Points:

The use of data.all for the creation of AWS service endpoints. The possibility of pre-existing endpoints within VPCs.

Questions for Maintainers:

  1. What happens when endpoints that already exist within a VPC are encountered by data.all?
  2. Are there recommended practices or mechanisms within data.all or the AWS CDK to gracefully handle scenarios where endpoints are pre-existing in VPCs?
  3. How are conflicts or potential issues addressed by the AWS CDK when attempting to create resources that may overlap with existing configurations?
noah-paige commented 6 months ago

Hi @anandsumit2000, I believe you will run into some issues if you are trying to create multiple of the same VPC endpoints within the same AWS account region pair.

Data.all does allow some customization of the VPC resources deployed that may help with the above - here is some more detailed information into how you are able to customize your VPC configuration when deploying data.all:

    "DeploymentEnvironments": [
      {
        ...
        "vpc_id": "string_DEPLOY_WITHIN_AN_EXISTING_VPC|DEFAULT=None",
        "vpc_endpoints_sg": "string_DEPLOY_WITHIN_EXISTING_VPC_SG|DEFAULT=None",
        "vpc_restricted_nacl": "boolean_CREATE_CUSTOM_NACL|DEFAULT=false",
        ...
      }
    ]

By Default (nothing specified in cdk.json)

If vpc_id specified in cdk.json

If vpc_endpoints_sg specified in cdk.json

If vpc_restricted_nacl specified in cdk.json

Feel free to read up more in our Deploy to AWS documentation for Step 6. "Configure the deployment options in the cdk.json file"

Also, you can take a deeper look at the code where VPC resources are created via CDK at dataall/deploy/stacks/vpc.py

Please let me know if any additional questions

anandsumit2000 commented 6 months ago

Hello @noah-paige . Thank you for respnding quick....... I should have elaborated more about the query. The question was about the Tooling Account. However, what you responded with concerns majorly with the Deployment Account. Because vpc_endpoints_sg is a property that is contained within Deployment Environment field.

noah-paige commented 6 months ago

I see the issue now... I think the easiest resolution to avoid any duplication of resources and/or any potential errors would be to extend to logic we have for Deployment Envs to the Tooling Env - please let me know if you agree

I will label this issue as an enhancement and work with the team to add it to the backlog

we welcome external contributions as well if you would like to contribute the fix sooner, thank you for bringing it to our attention