cloudera-labs / cloudera-deploy

A general purpose framework for automating Cloudera Products
Apache License 2.0
64 stars 62 forks source link
ansible cdp cdp-private-cloud cdp-public-cloud

cloudera-deploy - Automation Quickstarts and Examples for the Cloudera Data Platform (CDP)

cloudera-deploy is a rich set of examples and quickstart projects for deploying and managing the Cloudera Data Platform (CDP). Its scope includes Cloudera Data Platform (CDP) Public Cloud, Private Cloud, and Data Services and the software lifecycle of these platforms and the applications that work upon and with them.

You can use the definitions and projects in cloudera-deploy as your entrypoint for getting started with CDP. These resources use straightforward configurations and playbooks to instruct the automation functions, yet each is extensible and highly configurable.

cloudera-deploy is designed to not only get you up and running quickly with CDP, but also to showcase the underlying toolsets and libraries. These projects demonstrate what you can build and layout a great foundation for your own entrypoints, CI/CD pipelines, integrations, and general platform and application operations.

Quickstart

The definitions and projects in cloudera-deploy are designed to run with ansible-navigator and other Execution Environment-based tools.

Follow these steps to get started:

  1. Install ansible-navigator
  2. Check your requirements
  3. Select and configure your project
  4. Set your credentials
  5. Run your project

If you need help, check out the Frequently Asked Questions, the FAQ for cldr-runner, and drop by the Discussions > Help board.

Catalog

The catalog of projects, examples, and definitions currently covers CDP Public Cloud for AWS. CDP Private Cloud and individual Data Services, Public and Private, as well as Public Cloud deployments to Azure and Google Cloud, are coming soon.

Project Platform CSP Description
datalake public cloud AWS Constructs a CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc.
datalake-tf public cloud AWS Constructs a CDP Public Cloud Environment and Datalake. Uses the terraform-cdp-modules, called via Ansible, to generate the AWS infrastructure pre-requisite resources and the CDP artifacts.
cde public cloud AWS Constructs a set of Cloudera Data Engineering (CDE) workspaces within their own CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc.
cdf public cloud AWS Constructs a set of Cloudera Data Flow (CDF) workspaces and data hubs within their own CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc.
cml public cloud AWS Constructs a set of Cloudera Machine Learning (CML) workspaces within their own CDP Public Cloud Environment and Datalake. Generates via Ansible the AWS infrastructure and CDP artifacts, including SSH key, cross-account credentials, S3 buckets, etc.
base private cloud AWS IaaS Constructs a CDP Private Cloud Base cluster running on AWS IaaS. Uses Terraform to generate the AWS infrastructure and deploys to a SSH-proxied private cluster.

Roadmap

If you want to see what we are working on or have pending, check out:

Are we missing something? Let us know by creating a new issue or posting a new idea!

Contributions

For more information on how to get involved with the cloudera-deploy project, head over to CONTRIBUTING.md.

Requirements

cloudera-deploy itself is not an application, but its projects and examples expect to run within an execution environment called cldr-runner. This execution environment typically is a container that encapsulates the runtimes, libraries, Python and system dependencies, and general configurations needed to run an Ansible- and Terraform-enable project.

[!NOTE] It is worth pointing out that you don't have to use a container, but setting up a local execution environment is out-of-scope of cloudera-deploy; the projects in cloudera-deploy will run in any execution environment, for example AWX/Red Hat Ansible Automation Platform (AAP). If you want to learn more about setting up a local execution environment, head over to cloudera-labs/cldr-runner.

The cloudera-deploy projects and their playbooks are built with the automation resources provided by cldr-runner, notably, but not exclusively:

Besides these resources within cldr-runner, cloudera-deploy projects generally will need one or more of the following credentials:

CDP Public Cloud

For CDP Public Cloud, you will need an Access Key and Secret set in your user profile. The underlying automation libraries use your default profile unless you instruct them otherwise. See Configuring CDP client with the API access key for further details.

Cloud Providers

For Azure and AWS infrastructure, the process is similar to CDP Public Cloud, and these parameters may likewise be overridden.

For Google Cloud, we suggest you issue a credentials file, store it securely in your profile, and then reference that file as needed by a project's configuration, as this works best with both CLI and Ansible Gcloud interactions.

CDP Private Cloud

For CDP Private Cloud you will need a valid Cloudera license file in order to download the software from the Cloudera repositories. We suggest you store this file in your user profile in ~/.cdp/ and reference that file as needed by a project's configuration.

If you are also using Public Cloud infrastructure to host your CDP Private Cloud clusters, then you will need those credentials as well.

Installation and Usage

To use the projects in cloudera-deploy, you need to first set up ansible-navigator.

[!IMPORTANT] Please note each OS has slightly different requirements for installing ansible-navigator. :woozy_face: Read more about installing ansible-navigator.

  1. Create and activate a new Python virtualenv.

    You can name your virtual environment anything you want; by convention, we like to call it cdp-navigator.

    # Note! You will need Python 3.9 or higher!
    python3.9 -m venv ~/cdp-navigator; source ~/cdp-navigator/bin/activate;

    This step is highly recommended yet optional.

  2. Install the latest ansible-core and ansible-navigator.

    These tools can be the latest versions, as the actual execution versions are encapsulated in the execution environment container.

    pip install ansible-core ansible-navigator

[!NOTE] Further details can be found in the NAVIGATOR document in cloudera-labs/cldr-runner.

[!WARNING] On OSX, avoid using the stock Python executable with ansible-navigator; users report that the curses library in the stock installation is unable to run (throws a segfault). You might want to install another version of Python, such as using brew.

Then, clone this project.

git clone https://github.com/cloudera-labs/cloudera-deploy.git; cd cloudera-deploy;

Execution Engine

ansible-navigator can use either docker or podman. Either way, you will need a container runtime on your host.

Confirm your Docker service

Check that docker is available by running the following command to list any active Docker containers.

docker ps -a

If it is not running, please check your prerequisites process for Docker to install, start, and test the service.

Credentials

To check that your various credentials are available and valid -- that they match the expected accounts -- you can use ansible-navigator within your project and compare the user and account IDs produced with those found in the browser UI of the associated service.

[!IMPORTANT] All of the instructions below assume that your project is using the correct CSP-flavored image of cldr-runner. If in doubt, you can use the full image which has all supported CSP resources.

[!WARNING] Be sure you are within a project directory that has an ansible-navigator.yml configuration file that uses the cldr-runner image!

CDP Public Cloud

ansible-navigator exec -- cdp iam get-user

[!NOTE] If you do not yet have a CDP Public Cloud credential, follow these instructions on the Cloudera website.

See CDP CLI for further details.

AWS

ansible-navigator exec -- aws iam get-user

See AWS account requirements for further details.

Azure

ansible-navigator exec -- az account list

[!NOTE] If you cannot list your Azure accounts, consider using az login to refresh your local, i.e. host, credential.

See Azure subscription requirements for further details.

GCP

ansible-navigator exec -- gcloud auth list

[!NOTE] You need a provisioning Service Account for GCP setup (typically referenced by the gcloud_credential_file entry). If you do not yet have a Provisioning Service Account you can learn more on the Cloudera website.

See GCP requirements for further details.

Execution

All of the definitions and projects in cloudera-deploy are designed to work with ansible-navigator. Each project has discrete instructions on what and how to run, but in general, you will end up executing some form of the ansible-navigator run subcommand, like:

ansible-navigator run main.yml -e @config.yml -t plat

Occasionally, the instructions may ask you to run an individual module, such as ansible-navigator exec -- ansible some_group -m ping. You can learn more about the available subcommands on the ansible-navigator website.

[!NOTE] If you want to check out what's in the container, or use the container directly, run ansible-navigator exec -- /bin/bash!

Logs

The projects are configured to log their activities. In each, you will find a runs/ directory that houses all of the runtime artifacts of ansible-navigator and ansible-runner (the Ansible application and interface that does the actual Ansible command dispatching).

The log files are structured (JSON) and are indexed by playbook and timestamp. If you want to review, rather replay, you can load them into ansible-navigator:

ansible-navigator replay <playbook execution run file>.json

Upgrades

The cldr-runner image updates fairly often to include the latest libraries, new features and fixes. Depending on how ansible-navigator is configured (see the ansible-navigator.yml file), the application will check for an updated container image only if it is missing.

You can easily change this behavior; change your ansible-navigator.yml configuration in your project to:

ansible-navigator:
  execution-environment:
    pull:
      policy: always

Or use the CLI flags --pp or --pull-policy and set the value to always.

You can read more about updating this configuration on the ansible-navigator website.

Troubleshooting

If you need help, here are some resources:

Be sure to stop by the Discussions > Help board!

License and Copyright

Copyright 2023, Cloudera, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.