cloudera-deploy
is a rich set of examples and quickstart projects for deploying and managing the Cloudera Data Platform (CDP). Its scope includes Cloudera Data Platform (CDP) Public Cloud, Private Cloud, and Data Services and the software lifecycle of these platforms and the applications that work upon and with them.
You can use the definitions and projects in cloudera-deploy
as your entrypoint for getting started with CDP. These resources use straightforward configurations and playbooks to instruct the automation functions, yet each is extensible and highly configurable.
cloudera-deploy
is designed to not only get you up and running quickly with CDP, but also to showcase the underlying toolsets and libraries. These projects demonstrate what you can build and layout a great foundation for your own entrypoints, CI/CD pipelines, integrations, and general platform and application operations.
The definitions and projects in cloudera-deploy
are designed to run with ansible-navigator
and other Execution Environment-based tools.
Follow these steps to get started:
ansible-navigator
If you need help, check out the Frequently Asked Questions, the FAQ for cldr-runner, and drop by the Discussions > Help board.
The catalog of projects, examples, and definitions currently covers CDP Public Cloud for AWS. CDP Private Cloud and individual Data Services, Public and Private, as well as Public Cloud deployments to Azure and Google Cloud, are coming soon.
Project | Platform | CSP | Description |
---|---|---|---|
datalake |
public cloud | AWS | Constructs a CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc. |
datalake-tf |
public cloud | AWS | Constructs a CDP Public Cloud Environment and Datalake. Uses the terraform-cdp-modules, called via Ansible, to generate the AWS infrastructure pre-requisite resources and the CDP artifacts. |
cde |
public cloud | AWS | Constructs a set of Cloudera Data Engineering (CDE) workspaces within their own CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc. |
cdf |
public cloud | AWS | Constructs a set of Cloudera Data Flow (CDF) workspaces and data hubs within their own CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc. |
cml |
public cloud | AWS | Constructs a set of Cloudera Machine Learning (CML) workspaces within their own CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc. |
base |
private cloud | AWS IaaS | Constructs a CDP Private Cloud Base cluster running on AWS IaaS. Uses Terraform to generate the AWS infrastructure and deploys to a SSH-proxied private cluster. |
If you want to see what we are working on or have pending, check out:
Are we missing something? Let us know by creating a new issue or posting a new idea!
For more information on how to get involved with the cloudera-deploy
project, head over to CONTRIBUTING.md.
cloudera-deploy
itself is not an application, but its projects and examples expect to run within an execution environment called cldr-runner
. This execution environment typically is a container that encapsulates the runtimes, libraries, Python and system dependencies, and general configurations needed to run an Ansible- and Terraform-enable project.
[!NOTE] It is worth pointing out that you don't have to use a container, but setting up a local execution environment is out-of-scope of
cloudera-deploy
; the projects incloudera-deploy
will run in any execution environment, for example AWX/Red Hat Ansible Automation Platform (AAP). If you want to learn more about setting up a local execution environment, head over to cloudera-labs/cldr-runner.
The cloudera-deploy
projects and their playbooks are built with the automation resources provided by cldr-runner
, notably, but not exclusively:
cloudera.cloud
- Cloudera Data Platform (CDP) for Public Cloudcloudera.cluster
- Cloudera Data Platform (CDP) for Private Cloud and Cloudera Manager (CM)cloudera.exe
- Runlevel Management and Utilities for Cloudera Data Platform (CDP)cdp-tf-quickstarts
- CDP quickstarts using the Terraform Module for CDP Prerequisitesterraform-cdp-modules
- Terraform Modules for CDP PrerequisitesBesides these resources within cldr-runner
, cloudera-deploy
projects generally will need one or more of the following credentials:
For CDP Public Cloud, you will need an Access Key and Secret set in your user profile. The underlying automation libraries use your default
profile unless you instruct them otherwise. See Configuring CDP client with the API access key for further details.
For Azure and AWS infrastructure, the process is similar to CDP Public Cloud, and these parameters may likewise be overridden.
For Google Cloud, we suggest you issue a credentials file, store it securely in your profile, and then reference that file as needed by a project's configuration, as this works best with both CLI and Ansible Gcloud interactions.
For CDP Private Cloud you will need a valid Cloudera license file in order to download the software from the Cloudera repositories. We suggest you store this file in your user profile in ~/.cdp/
and reference that file as needed by a project's configuration.
If you are also using Public Cloud infrastructure to host your CDP Private Cloud clusters, then you will need those credentials as well.
To use the projects in cloudera-deploy
, you need to first set up ansible-navigator
.
[!IMPORTANT] Please note each OS has slightly different requirements for installing
ansible-navigator
. :woozy_face: Read more about installingansible-navigator
.
Create and activate a new Python virtualenv
.
You can name your virtual environment anything you want; by convention, we like to call it cdp-navigator
.
# Note! You will need Python 3.9 or higher!
python3.9 -m venv ~/cdp-navigator; source ~/cdp-navigator/bin/activate;
This step is highly recommended yet optional.
Install the latest ansible-core
and ansible-navigator
.
These tools can be the latest versions, as the actual execution versions are encapsulated in the execution environment container.
pip install ansible-core ansible-navigator
[!NOTE] Further details can be found in the NAVIGATOR document in
cloudera-labs/cldr-runner
.[!WARNING] On OSX, avoid using the stock Python executable with
ansible-navigator
; users report that thecurses
library in the stock installation is unable to run (throws a segfault). You might want to install another version of Python, such as usingbrew
.
Then, clone this project.
git clone https://github.com/cloudera-labs/cloudera-deploy.git; cd cloudera-deploy;
ansible-navigator
can use either docker
or podman
. Either way, you will need a container runtime on your host.
Check that docker
is available by running the following command to list any active Docker containers.
docker ps -a
If it is not running, please check your prerequisites process for Docker to install, start, and test the service.
To check that your various credentials are available and valid -- that they match the expected accounts -- you can use ansible-navigator
within your project and compare the user and account IDs produced with those found in the browser UI of the associated service.
[!IMPORTANT] All of the instructions below assume that your project is using the correct CSP-flavored image of
cldr-runner
. If in doubt, you can use thefull
image which has all supported CSP resources.[!WARNING] Be sure you are within a project directory that has an
ansible-navigator.yml
configuration file that uses thecldr-runner
image!
ansible-navigator exec -- cdp iam get-user
[!NOTE] If you do not yet have a CDP Public Cloud credential, follow these instructions on the Cloudera website.
See CDP CLI for further details.
ansible-navigator exec -- aws iam get-user
See AWS account requirements for further details.
ansible-navigator exec -- az account list
[!NOTE] If you cannot list your Azure accounts, consider using
az login
to refresh your local, i.e. host, credential.
See Azure subscription requirements for further details.
ansible-navigator exec -- gcloud auth list
[!NOTE] You need a provisioning Service Account for GCP setup (typically referenced by the
gcloud_credential_file
entry). If you do not yet have a Provisioning Service Account you can learn more on the Cloudera website.
See GCP requirements for further details.
All of the definitions and projects in cloudera-deploy
are designed to work with ansible-navigator
. Each project has discrete instructions on what and how to run, but in general, you will end up executing some form of the ansible-navigator run
subcommand, like:
ansible-navigator run main.yml -e @config.yml -t plat
Occasionally, the instructions may ask you to run an individual module, such as ansible-navigator exec -- ansible some_group -m ping
. You can learn more about the available subcommands on the ansible-navigator
website.
[!NOTE] If you want to check out what's in the container, or use the container directly, run
ansible-navigator exec -- /bin/bash
!
The projects are configured to log their activities. In each, you will find a runs/
directory that houses all of the runtime artifacts of ansible-navigator
and ansible-runner
(the Ansible application and interface that does the actual Ansible command dispatching).
The log files are structured (JSON) and are indexed by playbook and timestamp. If you want to review, rather replay, you can load them into ansible-navigator
:
ansible-navigator replay <playbook execution run file>.json
The cldr-runner
image updates fairly often to include the latest libraries, new features and fixes. Depending on how ansible-navigator
is configured (see the ansible-navigator.yml
file), the application will check for an updated container image only if it is missing.
You can easily change this behavior; change your ansible-navigator.yml
configuration in your project to:
ansible-navigator:
execution-environment:
pull:
policy: always
Or use the CLI flags --pp
or --pull-policy
and set the value to always
.
You can read more about updating this configuration on the ansible-navigator
website.
If you need help, here are some resources:
cloudera-deploy
cldr-runner
and ansible-navigator
Be sure to stop by the Discussions > Help board!
Copyright 2023, Cloudera, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.