cloudfoundry / cf-crd-explorations

Apache License 2.0
3 stars 2 forks source link

Spike: Create Processes based on staging output and understand Process CR lifecycle #15

Open tcdowney opened 3 years ago

tcdowney commented 3 years ago

Background

There are no explicit Create or Delete endpoints in the CF API for Processes. Instead, they can come from two places:

Note: In CF for VMs/CF for K8s there is no way to delete processes. You can only scale them down to 0 instances.

CF for VMs Context

Staging

Most Processes are provided as a result of staging the app. Buildpacks will supply default process types for an app and users can provide the buildpack with a Procfile to define their own process types. The definitions (type and start command) for these processes are sent back to Cloud Controller as part of the staging completion callback and are saved in the database as part of the Droplet.

The Processes are created in the database when the user assigns the Droplet to the app (via the PATCH /v3/apps/:guid/relationships/current_droplet endpoint).

App Manifests

Processes can also be defined in an App Manifest. When a manifest containing processes is sent to the CF API it will make Process records in the database for each one.

CF for K8s Context

Staging

Conceptually works the same as in CF for VMs. The implementation is different, though.

Kpack-built app images include process types and start command information as part of the container image metadata. The capi-k8s-release (kpack) Build Controller extracts process types from an image by calling out to the container registry to fetch its metadata.

This information is sent back to Cloud Controller via the PATCH /v3/builds/:guid endpoint and the droplet is updated.

Note: Curiously it bypasses the AppAssignDroplet codepath in this case so I'm not sure how those processes get created in the database. It might be the case that this is a bug and it's covered up by the fact the fact the CLI always calls the "set current droplet" endpoint during a push.

App Manifests

Same as CF for VMs.


What to do

Consider updating the Droplet CR and Controller

The CF Droplet object / GET endpoint includes information that we're not storing anywhere -- namely the process_types field (but also maybe buildpacks and stack could be important).

We may need to update the Droplet CR to store this information as well. We could update the Droplet Controller to do something similar to the capi-k8s-release Build controller and extract the process types whenever an image changes on a Droplet.

Figure out the best place to create Process CRs

For feature parity with CF we would do this whenever the currentDropletRef on an App changes or a manifest is applied (no spike code implements this yet).

Think about where it makes the most sense to do this work in the spike code (App Controller reacting to currentDropletRef changes?) and implement it.

Draw a diagram

Include a diagram that shows the chain of events that results in a Process being created from staging


Acceptance Criteria

GIVEN I have a cluster with the spike code deployed WHEN I stage an App (follow procedure outlined in https://github.com/cloudfoundry/cf-crd-explorations/issues/6) THEN I can view Process CRs that match the image processes.

Notes

heycait commented 3 years ago

Started drawing out a diagram here: https://miro.com/app/board/o9J_lFiI8CU=/?moveToWidget=3074457361338944304&cot=14

We would like the App Controller to watch the App CRs and read the Droplet CR information to then create a Process. However, Process takes in a lot of information such as:

As it stands now, this information is not available to us to make these configurable. For now, we can make defaults / dummy values for these fields. However, to actually be able to pass these in, we would require another CR such as the proposed App Manifest in order to completely make the Process.

heycait commented 3 years ago

Summary from chatting with @Birdrock and @matt-royal.

It turns out we didn't want to pursue AppManifest CR, although we will still have the AppManifest concept (i.e. the manifest.yml file in the source code repo). The logic should be in the CF Shim instead as this will reduce one layer of translation from the user to a CR manifest then to other CRs. Instead, the CF shim will directly parse the information and make necessary updates to the other CRs.

TODOs:

Open Questions:

Birdrock commented 3 years ago

We are considering sidecars as out of scope for the moment. We are aiming for happy path app workloads at the moment, so some of the different configurations aren't a priority for the spike.

It is worth noting that eirini does support sidecars: https://github.com/cloudfoundry-incubator/eirini/blob/8dbf019daf12bad66d1dea6f97e0e30155b01bf1/k8s/stset/lrp_to_statefulset.go#L274-L289

matt-royal commented 3 years ago

Another thing we discussed in real time was how process instance counts are determined. One way is through the app manifest, which is out of scope for this work. If no instance counts are specified in the manifest, then web processes get 1 instance and all others get 0. One of the code paths that accomplishes this is here: https://github.com/cloudfoundry/cloud_controller_ng/blob/main/app/actions/process_create.rb#L51

heycait commented 3 years ago

UPDATE While testing out the changes to create a Process, we ran into issues because the Builds we had didn't have any process information and therefore no processes or LRPs were getting created. As such, we decided to go ahead and move over the extraction to the Droplet controller instead.


Last week we discussed extracting the process/command information immediately after the build is made in the Build Controller. However, after speaking with Tim, we may not want this here in the future as it's possible to update the image in the Droplet. This isn't available to the users via the CF API, but it is essentially what happens when a stack change happens. See:

We will need to reconsider how this will work when we figure out how to update the stack in k8s.

matt-royal commented 3 years ago

Here are the commands I ran when I attempted to test this:

$ curl -X POST http://localhost:9000/v3/apps -d '{"name": "acceptance", "relationships": {"space": {"data": {"guid": "default"}}}}' -v
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> POST /v3/apps HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 81
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 81 out of 81 bytes
< HTTP/1.1 201 Created
< Content-Type: application/json
< Date: Tue, 20 Jul 2021 23:14:01 GMT
< Content-Length: 284
<
{"guid":"1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647","name":"acceptance","state":"STOPPED","created_at":"2021-07-20T23:14:01Z","updated_at":"","lifecycle":{"type":"kpack","data":{}},"relationships":{"space":{"data":{"guid":"default"}}},"links":{},"metadata":{"labels":{},"annotations":{}}}
* Connection #0 to host localhost left intact
* Closing connection 0

$ curl -X POST http://localhost:9000/v3/packages -d '{"type": "bits", "relationships": {"app": {"data": {"guid": "1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647"}}}}' -v
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> POST /v3/packages HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 102
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 102 out of 102 bytes
< HTTP/1.1 201 Created
< Content-Type: application/json
< Date: Tue, 20 Jul 2021 23:16:23 GMT
< Content-Length: 338
<
{"guid":"c4795710-d70c-49ca-8820-ad70e8273e72","type":"bits","data":{"checksum":{"type":"sha256","value":null},"error":null},"state":"AWAITING_UPLOAD","created_at":"2021-07-20T23:16:23Z","updated_at":"","relationships":{"app":{"data":{"guid":"1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647"}}},"links":{},"metadata":{"labels":{},"annotations":{}}}
* Connection #0 to host localhost left intact
* Closing connection 0

$ (cd ~/workspace/cf-acceptance-tests/assets/dora/; zip -r /tmp/dora.zip .)
  adding: .rspec (stored 0%)
  adding: config.ru (stored 0%)
  adding: spec/ (stored 0%)
  adding: spec/spec_helper.rb (deflated 48%)
  adding: spec/curl_spec.rb (deflated 66%)
  adding: spec/stress_testers_spec.rb (deflated 72%)
  adding: spec/logging_service_spec.rb (deflated 65%)
  adding: spec/instances_spec.rb (deflated 63%)
  adding: spec/logutil_spec.rb (deflated 72%)
  adding: stress_testers.rb (deflated 51%)
  adding: README.md (deflated 63%)
  adding: .gitignore (stored 0%)
  adding: instances.rb (deflated 53%)
  adding: log_utils.rb (deflated 64%)
  adding: scripts/ (stored 0%)
  adding: scripts/map_cookie_jars_to_instances.rb (deflated 55%)
  adding: scripts/run_performance_test_single_dea.rb (deflated 53%)
  adding: scripts/README.md (deflated 49%)
  adding: scripts/run_performance_test_single_dea_multiple bg.rb (deflated 60%)
  adding: scripts/run_performance_test_2_deas_multiple_bg.rb (deflated 66%)
  adding: scripts/clean up map script (deflated 56%)
  adding: scripts/scale_dora (deflated 45%)
  adding: logging_service.rb (deflated 57%)
  adding: Gemfile (deflated 23%)
  adding: Gemfile.lock (deflated 63%)
  adding: Procfile (deflated 33%)
  adding: curl.rb (deflated 33%)
  adding: .cfignore (stored 0%)
  adding: get_instance_cookie_jars.sh (deflated 53%)
  adding: dora.rb (deflated 58%)
  adding: vendor/ (stored 0%)
  adding: vendor/cache/ (stored 0%)
  adding: vendor/cache/rack-test-1.1.0.gem (deflated 16%)
  adding: vendor/cache/rspec-support-3.10.1.gem (deflated 12%)
  adding: vendor/cache/sinatra-2.1.0.gem (deflated 1%)
  adding: vendor/cache/diff-lcs-1.4.4.gem (deflated 5%)
  adding: vendor/cache/rspec-expectations-3.10.1.gem (deflated 5%)
  adding: vendor/cache/rspec-3.10.0.gem (deflated 45%)
  adding: vendor/cache/ruby2_keywords-0.0.2.gem (deflated 58%)
  adding: vendor/cache/json-2.5.1.gem (deflated 4%)
  adding: vendor/cache/mustermann-1.1.1.gem (deflated 5%)
  adding: vendor/cache/tilt-2.0.10.gem (deflated 14%)
  adding: vendor/cache/rspec-mocks-3.10.1.gem (deflated 5%)
  adding: vendor/cache/rack-protection-2.1.0.gem (deflated 18%)
  adding: vendor/cache/rspec-core-3.10.1.gem (deflated 3%)
  adding: vendor/cache/rack-2.2.3.gem (deflated 2%)
  adding: stress (deflated 58%)

$ curl -X POST http://localhost:9000/v3/packages/c4795710-d70c-49ca-8820-ad70e8273e72/upload -F bits=@/tmp/dora.zip -v
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> POST /v3/packages/c4795710-d70c-49ca-8820-ad70e8273e72/upload HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 2301244
> Content-Type: multipart/form-data; boundary=------------------------b91aa8c704c9f9e1
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Tue, 20 Jul 2021 23:23:58 GMT
< Content-Length: 328
<
{"guid":"c4795710-d70c-49ca-8820-ad70e8273e72","type":"bits","data":{"checksum":{"type":"sha256","value":null},"error":null},"state":"READY","created_at":"2021-07-20T23:16:23Z","updated_at":"","relationships":{"app":{"data":{"guid":"1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647"}}},"links":{},"metadata":{"labels":{},"annotations":{}}}
* Connection #0 to host localhost left intact
* Closing connection 0

$ curl -X POST http://localhost:9000/v3/builds -d '{"package": {"guid": "c4795710-d70c-49ca-8820-ad70e8273e72"}}' -v
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> POST /v3/builds HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 61
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 61 out of 61 bytes
< HTTP/1.1 201 Created
< Content-Type: application/json
< Date: Tue, 20 Jul 2021 23:30:04 GMT
< Content-Length: 414
<
{"guid":"3d1f625f-56c1-464f-b284-e8896d78ec5f","state":"","created_at":"2021-07-20T16:30:04-07:00","updated_at":"","lifecycle":{"type":"kpack","data":{"stack":"cflinuxfs3"}},"package":{"guid":"c4795710-d70c-49ca-8820-ad70e8273e72"},"droplet":{"guid":"NOT CURRENTLY IMPLEMENTED"},"relationships":{"app":{"data":{"guid":"1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647"}}},"links":{},"metadata":{"labels":{},"annotations":{}}}
* Connection #0 to host localhost left intact
* Closing connection 0

This turned out to be a large adventure and I had to make a lot of changes (which I pushed to the branch):

After making all of these changes, I got to a place where the kpack build was succeeding for the dora app (which I chose because it has 2 processes in its Procfile), but the follow-up step to extract process types was failing with the following error:

2021-07-20T17:48:13.372-0700    INFO    controller-runtime.manager.controller.droplet   Attempting to reconcile default/droplet-896a55aa-2698-41af-b900-0fe775cb8876    {"reconciler group": "apps.cloudfoundry.org", "reconciler kind": "Droplet", "name": "droplet-896a55aa-2698-41af-b900-0fe775cb8876", "namespace": "default"}
2021-07-20T17:48:14.634-0700    INFO    controller-runtime.manager.controller.droplet   Error fetching image config: GET https://gcr.io/v2/cf-relint-greengrass/cf-crd-staging-spike/akira/63ae3e5c-eb17-4eb7-a32d-245f74868bba/manifests/sha256:3465fad4cb90b1d22f313dfd0ebaf8fb99138c8eb9fade2226e1785c4e7fc0b0: UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
    {"reconciler group": "apps.cloudfoundry.org", "reconciler kind": "Droplet", "name": "droplet-896a55aa-2698-41af-b900-0fe775cb8876", "namespace": "default"}
2021-07-20T17:48:14.634-0700    INFO    controller-runtime.manager.controller.droplet   Error occurred extracting process types and commands: GET https://gcr.io/v2/cf-relint-greengrass/cf-crd-staging-spike/akira/63ae3e5c-eb17-4eb7-a32d-245f74868bba/manifests/sha256:3465fad4cb90b1d22f313dfd0ebaf8fb99138c8eb9fade2226e1785c4e7fc0b0: UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication    {"reconciler group": "apps.cloudfoundry.org", "reconciler kind": "Droplet", "name": "droplet-896a55aa-2698-41af-b900-0fe775cb8876", "namespace": "default"}

I tracked this down to this line: https://github.com/cloudfoundry/cf-crd-explorations/blob/ee719db1cfd6455bdab3e949ed7ac34fa95bb4e9/controllers/droplet_controller.go#L81

We're assuming that the Droplet will have the imagePullSecrets that we need to access the registry, but the Droplet created by the API has no imagePullSecrets. This is the next problem to solve.

akrishna90 commented 3 years ago

@heycait and I saw that the credentials used for pushing the image to app registry was not persisted on Package CR. We addressed this concern on commit - 39409bae8c00

We see that the build controller is using the above secret to fetch the app source image at line 158

But these changes seems to not have any effect. We still see the same error as above

2021-07-21T15:53:56.567-0600    INFO    controller-runtime.manager.controller.droplet   Attempting to reconcile default/droplet-86410e0c-6450-43dc-b26b-fa499bbc56c9    {"reconciler group": "apps.cloudfoundry.org", "reconciler kind": "Droplet", "name": "droplet-86410e0c-6450-43dc-b26b-fa499bbc56c9", "namespace": "default"}

2021-07-21T15:53:57.432-0600    INFO    controller-runtime.manager.controller.droplet   Error fetching image config: GET https://gcr.io/v2/cf-relint-greengrass/cf-crd-staging-spike/kpack/4b568bf3-9af1-42fc-9710-8f03dc6e4274/manifests/sha256:ee368c9f509c0b9ae885ec9fd088cb9504bcbd04b1d57396aa53d8b2f32652d7: UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
tcdowney commented 3 years ago

@akrishna90 and I found out that the issue was due to us using a generic secret type instead of dockerconfigjson. The Kpack auth helper only works with a specific set of secret types: https://github.com/pivotal/kpack/blob/4e5647354c3a55fee655199646059efd93f9b0c4/pkg/dockercreds/k8sdockercreds/k8s_keychain.go#L94

akrishna90 commented 3 years ago

Capturing the open question from @gnovv here : Are Docker Images OCI compliant? The question was in reference to lines written here https://github.com/cloudfoundry/cf-crd-explorations/blob/6b681ebc51411c2fe9c13e1cc45aa62125252786/controllers/droplet_controller.go#L108-L110

@tcdowney says - yeah, the code probably isn't robust enough. The process types metadata is not part of the OCI spec and just something that Cloud Native Buildpack-built images can reliably have. I'm not sure if our code handles the case where it's missing.

Docker images should have the ExposedPorts field, but that's also optional in the OCI spec so we can't rely on it and should handle the case where they're missing. I think the code probably handles it, but we didn't test that case.

Relevant OCI Image spec (search for ExposedPorts): https://github.com/opencontainers/image-spec/blob/main/config.md

This is a great point, though, for when we start implementing this for real. We'll want to have good automated tests for these cases since there are so many ways to build a container image.