aws-solutions / aws-data-lake-solution

A deployable reference implementation intended to address pain points around conceptualizing data lake architectures that automatically configures the core AWS services necessary to easily tag, search, share, and govern specific subsets of data across a business or with other external businesses.
https://aws.amazon.com/solutions/implementations/data-lake-solution/
Apache License 2.0
399 stars 160 forks source link

Fails to deploy, bucket interpolation bug #11

Closed john-aws closed 6 years ago

john-aws commented 7 years ago

The data lake solution fails to deploy, with the following CloudFormation error:

Template bucket referenced by https://s3.amazonaws.com/mybucket-us-east-1/data-lake/latest/data-lake-storage.yaml does not exist.

The deploy instructions tell the user to set DEPLOY_BUCKET to the desired S3 bucket that will host deployment assets, to then build the deployment assets, and finally copy them to the aforementioned bucket.

The build-s3-dist.sh script replaces BUCKET_NAME in the deployment YAML file with the value that the user supplied in DEPLOY_BUCKET. However, the nested stacks (e.g. DataLakeStorageStack) use the following:

!Join ["-", [!FindInMap ["SourceCode", "General", "S3Bucket"], Ref: "AWS::Region"]]

This results in CloudFormation stack creation failure because CloudFormation is looking for the YAML file in a bucket with the region suffix:

s3.amazonaws.com/mybucket-us-east-1/data-lake/latest/data-lake-storage.yaml

when, in fact, the YAML file is in a bucket without the region suffix:

s3.amazonaws.com/mybucket/data-lake/latest/data-lake-storage.yaml
hvital commented 6 years ago

The version 2.0 was published and README.md was reviewed to add more details about how source-bucket-base-name build parameter works.

This structure of appending the region code was created to help Automated Deployment. The S3 bucket where the lambda .zip packages are stored must reside in the same AWS Region that you're creating the Lambda function in (more info here). The solution then keeps a copy of all assets in every supported region and when you run the template, it will search for dependencies in the same region.