The Automated Data Analytics on AWS solution provides an end-to-end data platform for ingesting, transforming, managing and querying datasets. This helps analysts and business users manage and gain insights from data without deep technical experience using Amazon Web Services (AWS). It has an open-sourced architecture with connectors to commonly used AWS services, along with third-party data sources and services. This solution also provides an user interface (UI) to search, share, manage, and query datasets using standard SQL commands.
The following diagram represents the solution's architecture design.
The Automated Data Analytics on AWS solution automates the building of data pipelines that are optimized for the size, frequency of update, and type of data. These data pipelines handle the data ingestion, transformations, and queries.
The Automated Data Analytics on AWS solution creates and integrates a combination of AWS services required to perform these tasks, abstracted through a user interface. These services include AWS Glue crawlers, jobs, workflows and triggers, along with S3 buckets, IAM integration, and other services. Additionally, the solution automatically detects and redacts personally identifiable information (PII) with granular security and governance controls.
For more information on the solution’s architecture, refer to the implementation guide.
A CDK bootstrapped AWS account.
Sufficient AWS Lambda Concurrent executions limit
Applied quota value
in your account is greater or equal to the AWS default quota value
(which is 1000). Click this link to check it in your AWS Console. If Applied quota value
is less than 1000, please use Request quota increase
button to make a request to increase it to at least 1000 before deploying the solution. For more details, please refer to AWS Lambda Developer Guide.The latest version of the AWS CLI, installed and configured.
node.js version 18.19.
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.2/install.sh | bash
exec $SHELL -l
nvm install 18.19
install yarn
npm install --global yarn
Python 3.12.2
pipenv
to avoid version conflictspip3 install --user pipenv
export PATH="/home/<YOUR_USERNAME>/.local/bin:$PATH"
git clone https://github.com/pyenv/pyenv.git ~/.pyenv
export PATH="/home/<YOUR_USERNAME>/.pyenv/bin:$PATH"
sudo yum-builddep python3
pipenv --python 3.12.2
# after clone the Ada repository, navigate to the Ada directory and run the following commands
cd <Ada directory>
pyenv local 3.12.2
eval "$(pyenv init -)"
Docker Desktop (>= v20.10)
Note regarding AWS CDK version: We recommend running all
cdk <cmd>
related tasks viayarn cdk <cmd>
to ensure exact version parity. If you choose to run globally installedcdk
command, ensure you have a compatible version of AWS CDK installed globally.
git clone https://github.com/aws-solutions/automated-data-analytics-on-aws
cd automated-data-analytics-on-aws/source
Run the following command.
chmod +x ./run-all-tests.sh
./run-all-tests.sh
The /source/run-all-tests.sh
script is the centralized script to install all dependencies, build the solution from source code and execute all unit tests.
The build process including downloading dependencies takes about 60 minutes the first time.
After you have successfully cloned the repository into your local development environment, you will see the following file structure in your editor.
|- .github/ ... - resources for open-source contributions.
|- source/ - solution's source code
|- @types - type utilities
|- cypress - cypress tests
|- images - images files for the documentation
|- packages - multiple packages of solution source code, including unit tests
|- scripts - helper scripts
|- header.txt - license header
|- lerna.json - configuration file for lerna
|- packages.json - package file for solution root package
|- run-all-tests.sh - runs all tests within the /source folder
|- yarn-audit.js - helper script for yarn audit
|- yarn.lock - yarn lockfile
|- .gitignore
|- CHANGELOG.md - changelog file to track changes between versions
|- CODE_OF_CONDUCT.md - code of conduct for open source contribution
|- CONTRIBUTING.md - detailed information about open source contribution
|- LICENSE.txt - Apache 2.0 license.
|- NOTICE.txt - Copyrights for Automated Data Analytics on AWS solution
|- THIRDPARTY_LICENSE.txt - Copyrights licenses for third party software that was used in this solution
|- README.md - this file
Before you begin, ensure that:
export AWS_REGION=<region-id>
. For a list of supported AWS region, refer to https://docs.aws.amazon.com/solutions/latest/automated-data-analytics-on-aws/design-considerations.htmlautomated-data-analytics-on-aws/source
yarn deploy-solution --parameters adminEmail="<Your email address>" --parameters adminPhoneNumber="<Your mobile number for MFA>"
optional
to simplify the process. If this is preferred, use the following command for deployment:
yarn deploy-solution --parameters adminEmail="<Your email address>" --parameters adminMFA='OPTIONAL' --parameters advancedSecurityMode='OFF'
The deployment may take up to 60 minutes to complete. During deployment the temporary password for the root administrator will be sent via email to the specified adminEmail. The email address and temporary password received in email can be used to log into the Automated Data Analytics on AWS solution for the initial setup.
After the solution has been deployed, the CDK returns the following information. Using this information, follow the steps to access the Automated Data Analytics on AWS solution.
Note: To view the information returned by CDK from the AWS CloudFormation Console, navigate to the Ada stack and select the Outputs section.
Outputs:
Ada.AthenaProxyApiUrl = example.cloudfront.net:443
Ada.BaseApiUrl = https://example.execute-api.ap-southeast-2.amazonaws.com/prod/
Ada.CognitoUserPoolId = ap-southeast-2_Example
Ada.ExportNamespaceGlobalUUID = example
Ada.RetainedResourcesExport = ["arn:aws:kms:ap-southeast-2:123456789012:key/5dad9516-0007-4993-a613-example","arn:aws:kms:ap-southeast-2:123456789012:key/21d45985-6c92-41e9-a762-example","arn:aws:s3:::ada-dataproductservicestack752cb9-databucket-hash","arn:aws:s3:::ada-dataproductservicestack752-scriptsbucket-hash","arn:aws:s3:::ada-dataproductservicestack-fileuploadbucket-hash"]
Ada.UserPoolClientId = example
Ada.WebsiteUrl = https://example1234.cloudfront.net/
AthenaProxyApiUrl
: The URL for connecting Automated Data Analytics on AWS with Tableau / PowerBI via JDBC/ODBC.BasedApiUrl
: Rest API URL.CognitoUserPoolId
: Cognito User Pool id.ExportNamespaceGlobalUUID
: A global unique identifier specific to this deployment.RetainedResourcesExport
: A list of AWS Resources ARNs for which the resources will be retained if this solution is uninstalled (tore down) from the web console with retaining data or tore down from AWS CloudFormation Console.UserPoolClientId
: Cognito user pool app client idWebsiteUrl
: Automated Data Analytics on AWS web UI URLWebsiteUrl
in your browser. We recommend using Chrome. You will be redirected to the sign in page that requires username and password.no-reply@verificationemail.com
.We recommend using an external OpenID Connect or SAML 2.0 compatible identity provider to manage your users who need access to Automated Data Analytics on AWS. If there is an existing enterprise Identity Provider, you can integrate it with Automated Data Analytics on AWS. The root administrator user can set it up by accessing Admin -> Identity Provider
in the Automated Data Analytics on AWS Web UI.
For more information on how to set up your Identity Provider, refer to the implementation guide
You can uninstall the solution either from the Automated Data Analytics on AWS web UI or by directly deleting the stacks from the AWS CloudFormation console.
To uninstall the solution from the Automated Data Analytics on AWS web UI:
Admin -> TearDown
Note: Using the Teardown page, you can permanently remove the Automated Data Analytics on AWS solution from your account. The Teardown option is only available for users with root_admin access.
Note: If you chose to retain data, the data buckets and KMS keys for the data buckets will be retained.
To uninstall the solution from AWS CloudFormation console:
Ada
stack, and check its Output section, note down the value of the key RetainedResourcesExport
and copy it to a text file to keep. These resources are retained after the main Automated Data Analytics on AWS stack is deleted.RetainedResourcesExport
). You can migrate data out of these buckets.RetainedResourcesExport
file.RetainedResourcesExport
file.This solution collects anonymized operational metrics to help AWS improve the quality of features of the solution. For more information, including how to disable this capability, refer to the implementation guide.
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
Licensed under the Apache License Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at
http://www.apache.org/licenses/
or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and limitations under the License.