cloudfoundry / cloud_controller_ng

Cloud Foundry Cloud Controller
Apache License 2.0
191 stars 357 forks source link

Deployments Max-in-flights proposal #3878

Closed sethboyles closed 2 months ago

sethboyles commented 2 months ago

Feature goals

Currently, Rolling Deployments bring up one new application instance, then bring down one old application instance. For apps with large numbers of instances (in the dozens), this process can take an excessively long time.

Max-in-flight is a proposed configuration for Rolling Deployments that would allow App Operators (Admin, Space Developer, or Space Supporter) to speed up the deployment process by allowing multiple application instances to be brought up and torn down by Rolling Deployments simultaneously.

Usage

App Developers will be able to push an application with a specified max-in-flight by using a new --max-in-flightoption:

$ cf push myapp --max-in-flight=3 --strategy=rolling

This flag is only available if using the --strategy flag.

This flag will also be available for other actions like restart, restage, and rollback, as well as the Canary strategy. For example:

$ cf restart myapp --max-in-flight=3 --strategy=canary

Observability

Currently, users have limited visibility into ongoing deployments. Ongoing Deployments can be inferred by viewing the cf app output and noting that the app has multiple processes of the same type. To help increase the visibility of deployments, we will add an explicit section to the cf app output:

$ cf app myapp
# Streamed staging output

name:              myapp
requested state:   started
routes:            myapp.example.com
last uploaded:     Tue 21 May 20:50:44 UTC 2024
stack:             cflinuxfs4
buildpacks:
        name             version   detect output   buildpack name
        ruby_buildpack   1.10.5    ruby            ruby

type:            web
sidecars:
instances:       3/3
memory usage:    1024M
start command:   bundle exec rackup config.ru -p $PORT -o 0.0.0.0
     state     since                  cpu    memory        disk           logging              details
#0   running   2024-05-21T20:50:56Z   0.4%   46M of 1G     129.6M of 1G   0/s of unlimited
#1   starting   2024-05-21T20:50:56Z   0.4%   46M of 1G     129.6M of 1G   0/s of unlimited
#2   starting   2024-05-21T20:50:56Z   0.4%   46M of 1G     129.6M of 1G   0/s of unlimited

type:            web
sidecars:
instances:       1/1
memory usage:    1024M
start command:   bundle exec rackup config.ru -p $PORT -o 0.0.0.0
     state     since                  cpu    memory        disk           logging              details
#0   running   2024-05-21T20:50:56Z   0.4%   46M of 1G     129.6M of 1G   0/s of unlimited
#2   running   2024-05-21T20:50:56Z   0.4%   46M of 1G     129.6M of 1G   0/s of unlimited

Rolling Deployment currently DEPLOYING (max-in-flight: 2)

Or for Canary Deployment: 

CANARY Deployment currently PAUSED (max-in-flight: 2)

Technical Behavior Overview

Given a max-in-flight of 5, and an app instance count of 50, Cloud Controller would roll out a new version of an app with the following process:

  1. Cloud Controller desires 5 new LRPs with the new app version from Diego

  2. Cloud Controller waits until at least 1 of the new LRPs reports as healthy

  3. For each of the healthy new LRPs, Cloud Controller requests Diego to bring down an old app version instance (checking scaling + quota validations at each step)

  4. Cloud Controller waits until at least one old app version is torn down (done simultaneously as step 2). 

  5. For each old app version that is torn down, Cloud Controller desires a new LRP, never exceeding 55 instances at one time (50 instances + 5 max-in-flight) 

  6. Repeat until all 50 instances have been replaced 

Max-in-flight will be available for both Rolling and Canary Deployments, and potentially any future deployment strategies.

API

Creating a Deployment will use a new options property of the Deployment resource

POST https://api.example.org/v3/deployments

Canary Deployment JSON example:

{
  "droplet": {
    "guid": "[droplet-guid]"
  },
  "strategy": "rolling",
  "options": {
    "max_in_flight": 10
  },
  "relationships": {
    "app": {
      "data": {
        "guid": "[app-guid]"
      }
    }
  }
}

max_in_flight will default to 1 (current behavior).

Limitations

\ Since max-in-flight of N uses N extra application instances, Org and Space admins will have to ensure their quotas have been allocated enough resources if App Developers plan to use large max-in-flight values.