chef / chef-workstation

Chef Workstation gives you everything you need to get started with Chef, so you can automate how you audit, configure, and manage applications end environments.
Apache License 2.0
134 stars 112 forks source link

Rollout reporting - meta data acquisition and injection into Rollout event object #1316

Closed jonsmorrow closed 3 years ago

jonsmorrow commented 4 years ago

Note: GitHub integration record type. Please leave out any sensitive information.

Job to be Done:

As a Chef Desktop operator, I want to know what changes a Policyfile revision includes, so I can easier troubleshoot if something goes wrong.

As a Chef Desktop operator, I want to know how many machines are on a given Policyfile revision, so that I can ensure my fleet is up to date with the latest fixes.

Description:

Chef operators can currently generate Policyfile changes and push them to Chef Server. But there is not a way in automate to view who made a Policyfile change, how many machines have updated to that revision, what code changes are included in the Policyfile revision, etc.

Automate is being updated with new endpoints to consume metadata about a given Policyfile "Rollout". A Rollout is the application of a Policyfile revision to a set of nodes. Automate will combine the Rollout information with Chef Server events (EG, chef-client ran a specific policyfile revision) in order to create a new view for users.

Automate proto responese definition: https://github.com/chef/automate/blob/c1adaa428c0189dd09de8c3a78668a8867721ef1/api/external/cfgmgmt/response/rollouts.proto#L40-L107

Design doc: https://github.com/chef/automate/blob/master/components/config-mgmt-service/docs/rollout-metadata-collection-design.md

Acceptance Criteria:

  1. Targeted at users who have an existing CI/CD pipeline already in place to deploy cookbook changes
  2. Users would opt-in to the Rollout feature via config
  3. Users must provide required Automate auth config as part of this opt-in
  4. chef push updated to fetch additional metadata and send it to Automate
    1. Read from Policyfile lock, injected by chef push, read from local config (EG, .git), read from environment variables
  5. chef push will attempt to retry any metadata push failures (EG, auth failures)
  6. Users can easily see chef push failures and causes in their CI pipeline
  7. Add new command for checking auth into Automate and rollout endpoint works (ping-pong)

Story Map

https://stickies.io/boards/5f238ff67f44436306bfbee1#1

Questions

  1. Do we only support Git, or multiple SCMs?
  2. What products do our customers use for CI pipelines?
    1. What kinds of CI pipelines do we support? Buildkite, Jenkins, Rundeck, etc.
  3. What is the best UX for users to populate the description?
    1. If a pipeline is doing the chef push command, where/how will we prompt users for a Rollout description?
  4. Can we have users auth to Automate using a log-in screen via the Chef Workstation App?

Answers

  1. How do nodes get assigned a policy group?
    • knife node policy set NODE POLICY_GROUP POLICY_NAME
  2. How does a non-Ruby library consume the information from the merged Chef workstation config?
    • Those are required fields in the invocation of the external tool
  3. Does the Rollout object need to exist before the Policyfile object is sent to the Chef Server? Is there any dependency between those two data objects?
    • recommendation - do the policyfile push, make sure that succeeded, then send the rollout information
    • If we cannot send the policyfile, don't send the rollout
  4. If we cannot auth to Automate, do we hold off on sending the Policyfile object to Chef Server?
    • No - if automate is down we still want customers to be able to update their code
  5. Spike with lots of research - https://github.com/chef/chef-workstation/issues/1326

Field population

// These 3 fields come from the `chef push POLICY_GROUP` command and the Policyfile
string policy_name = 1;
string policy_node_group = 2;
string policy_revision_id = 3;
// Comes from local config, eg: /etc/chef/client.rb or ~/.chef/config.rb, or command line flags
// to `chef push` command
string policy_domain_url = 4;
// For git, can look for the presense of a `.git` folder
// Server currently only supports git and github
SCMType scm_type = 5;
SCMWebType scm_web_type = 6; // github vs gitlab, etc.
string policy_scm_url = 7;
// For git, look at the upstream config
string policy_scm_web_url = 8;
string policy_scm_commit = 9;
// ???
// Could come from the git commit
// Could embed the description into the Policyfile.lock
// Could come from the CI system when a job is started
string description = 10;
// For Jenkins or Buildkite, pulled from environment variables
string ci_job_url = 11;
string ci_job_id = 12;
// From the git config
string scm_author_name = 16;
string scm_author_email = 17;
// From chef config, client name for `chef push` auth to Chef Server
string policy_domain_username = 18;

All fields besides 1-3 are optional.

Aha! Link: https://chef.aha.io/features/SH-148

tyler-ball commented 4 years ago

Here's the WIP code for the data gather itself: https://github.com/chef/automate/pull/4032/files#diff-70bb3ea685fdedab94a082c8045bd456R1

tyler-ball commented 4 years ago

Auth notes:

To toggle Rollout sending on:

The same possibilities apply for configuring the automate URL.

marcparadise commented 4 years ago

Relevant: https://github.com/chef/automate/blob/master/components/config-mgmt-service/docs/rollout-metadata-collection-design.md

marcparadise commented 4 years ago

Based on review of the current version of the tool, I recommend moving forward with the standalone tool. There are a few advantages:

However, there are also some concerns raised by this approach:

marcparadise commented 4 years ago

There are additional concerns to track, as a result of adding a new API dependency:

marcparadise commented 4 years ago

We have two places it would make sense to invoke from:

At this time, I'm recommending we put it in the post-run behavior of the wrapper after successful run of chef-cli push but there's probably some discussion to have here:

The disadvantage to this approach is that directly invoking chef-cli push will not hit the Rollouts API. Given the current direction of Effortless (no chef server) this may be OK, but we will need to verify it and decide if we need any special handling (like not allowing push outside of Workstation; or warn if rollouts api is configured but we're not running from within workstation.

tyler-ball commented 4 years ago

Documentation note - Chef Server must be setup to send action reports to Automate. This needs to be part of our setup documentation.

jonsmorrow commented 3 years ago

Comment added by Lisa Stidham in Aha! - View

Noting Issue 1316