Open tcdowney opened 3 years ago
Started drawing out a diagram here: https://miro.com/app/board/o9J_lFiI8CU=/?moveToWidget=3074457361338944304&cot=14
We would like the App Controller to watch the App CRs and read the Droplet CR information to then create a Process. However, Process takes in a lot of information such as:
As it stands now, this information is not available to us to make these configurable. For now, we can make defaults / dummy values for these fields. However, to actually be able to pass these in, we would require another CR such as the proposed App Manifest in order to completely make the Process.
Summary from chatting with @Birdrock and @matt-royal.
It turns out we didn't want to pursue AppManifest CR, although we will still have the AppManifest concept (i.e. the manifest.yml
file in the source code repo). The logic should be in the CF Shim instead as this will reduce one layer of translation from the user to a CR manifest then to other CRs. Instead, the CF shim will directly parse the information and make necessary updates to the other CRs.
TODOs:
Open Questions:
We are considering sidecars as out of scope for the moment. We are aiming for happy path app workloads at the moment, so some of the different configurations aren't a priority for the spike.
It is worth noting that eirini does support sidecars: https://github.com/cloudfoundry-incubator/eirini/blob/8dbf019daf12bad66d1dea6f97e0e30155b01bf1/k8s/stset/lrp_to_statefulset.go#L274-L289
Another thing we discussed in real time was how process instance counts are determined. One way is through the app manifest, which is out of scope for this work. If no instance counts are specified in the manifest, then web processes get 1 instance and all others get 0. One of the code paths that accomplishes this is here: https://github.com/cloudfoundry/cloud_controller_ng/blob/main/app/actions/process_create.rb#L51
UPDATE While testing out the changes to create a Process, we ran into issues because the Builds we had didn't have any process information and therefore no processes or LRPs were getting created. As such, we decided to go ahead and move over the extraction to the Droplet controller instead.
Last week we discussed extracting the process/command information immediately after the build is made in the Build Controller. However, after speaking with Tim, we may not want this here in the future as it's possible to update the image in the Droplet. This isn't available to the users via the CF API, but it is essentially what happens when a stack change happens. See:
We will need to reconsider how this will work when we figure out how to update the stack in k8s.
Here are the commands I ran when I attempted to test this:
$ curl -X POST http://localhost:9000/v3/apps -d '{"name": "acceptance", "relationships": {"space": {"data": {"guid": "default"}}}}' -v
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> POST /v3/apps HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 81
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 81 out of 81 bytes
< HTTP/1.1 201 Created
< Content-Type: application/json
< Date: Tue, 20 Jul 2021 23:14:01 GMT
< Content-Length: 284
<
{"guid":"1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647","name":"acceptance","state":"STOPPED","created_at":"2021-07-20T23:14:01Z","updated_at":"","lifecycle":{"type":"kpack","data":{}},"relationships":{"space":{"data":{"guid":"default"}}},"links":{},"metadata":{"labels":{},"annotations":{}}}
* Connection #0 to host localhost left intact
* Closing connection 0
$ curl -X POST http://localhost:9000/v3/packages -d '{"type": "bits", "relationships": {"app": {"data": {"guid": "1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647"}}}}' -v
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> POST /v3/packages HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 102
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 102 out of 102 bytes
< HTTP/1.1 201 Created
< Content-Type: application/json
< Date: Tue, 20 Jul 2021 23:16:23 GMT
< Content-Length: 338
<
{"guid":"c4795710-d70c-49ca-8820-ad70e8273e72","type":"bits","data":{"checksum":{"type":"sha256","value":null},"error":null},"state":"AWAITING_UPLOAD","created_at":"2021-07-20T23:16:23Z","updated_at":"","relationships":{"app":{"data":{"guid":"1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647"}}},"links":{},"metadata":{"labels":{},"annotations":{}}}
* Connection #0 to host localhost left intact
* Closing connection 0
$ (cd ~/workspace/cf-acceptance-tests/assets/dora/; zip -r /tmp/dora.zip .)
adding: .rspec (stored 0%)
adding: config.ru (stored 0%)
adding: spec/ (stored 0%)
adding: spec/spec_helper.rb (deflated 48%)
adding: spec/curl_spec.rb (deflated 66%)
adding: spec/stress_testers_spec.rb (deflated 72%)
adding: spec/logging_service_spec.rb (deflated 65%)
adding: spec/instances_spec.rb (deflated 63%)
adding: spec/logutil_spec.rb (deflated 72%)
adding: stress_testers.rb (deflated 51%)
adding: README.md (deflated 63%)
adding: .gitignore (stored 0%)
adding: instances.rb (deflated 53%)
adding: log_utils.rb (deflated 64%)
adding: scripts/ (stored 0%)
adding: scripts/map_cookie_jars_to_instances.rb (deflated 55%)
adding: scripts/run_performance_test_single_dea.rb (deflated 53%)
adding: scripts/README.md (deflated 49%)
adding: scripts/run_performance_test_single_dea_multiple bg.rb (deflated 60%)
adding: scripts/run_performance_test_2_deas_multiple_bg.rb (deflated 66%)
adding: scripts/clean up map script (deflated 56%)
adding: scripts/scale_dora (deflated 45%)
adding: logging_service.rb (deflated 57%)
adding: Gemfile (deflated 23%)
adding: Gemfile.lock (deflated 63%)
adding: Procfile (deflated 33%)
adding: curl.rb (deflated 33%)
adding: .cfignore (stored 0%)
adding: get_instance_cookie_jars.sh (deflated 53%)
adding: dora.rb (deflated 58%)
adding: vendor/ (stored 0%)
adding: vendor/cache/ (stored 0%)
adding: vendor/cache/rack-test-1.1.0.gem (deflated 16%)
adding: vendor/cache/rspec-support-3.10.1.gem (deflated 12%)
adding: vendor/cache/sinatra-2.1.0.gem (deflated 1%)
adding: vendor/cache/diff-lcs-1.4.4.gem (deflated 5%)
adding: vendor/cache/rspec-expectations-3.10.1.gem (deflated 5%)
adding: vendor/cache/rspec-3.10.0.gem (deflated 45%)
adding: vendor/cache/ruby2_keywords-0.0.2.gem (deflated 58%)
adding: vendor/cache/json-2.5.1.gem (deflated 4%)
adding: vendor/cache/mustermann-1.1.1.gem (deflated 5%)
adding: vendor/cache/tilt-2.0.10.gem (deflated 14%)
adding: vendor/cache/rspec-mocks-3.10.1.gem (deflated 5%)
adding: vendor/cache/rack-protection-2.1.0.gem (deflated 18%)
adding: vendor/cache/rspec-core-3.10.1.gem (deflated 3%)
adding: vendor/cache/rack-2.2.3.gem (deflated 2%)
adding: stress (deflated 58%)
$ curl -X POST http://localhost:9000/v3/packages/c4795710-d70c-49ca-8820-ad70e8273e72/upload -F bits=@/tmp/dora.zip -v
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> POST /v3/packages/c4795710-d70c-49ca-8820-ad70e8273e72/upload HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 2301244
> Content-Type: multipart/form-data; boundary=------------------------b91aa8c704c9f9e1
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Tue, 20 Jul 2021 23:23:58 GMT
< Content-Length: 328
<
{"guid":"c4795710-d70c-49ca-8820-ad70e8273e72","type":"bits","data":{"checksum":{"type":"sha256","value":null},"error":null},"state":"READY","created_at":"2021-07-20T23:16:23Z","updated_at":"","relationships":{"app":{"data":{"guid":"1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647"}}},"links":{},"metadata":{"labels":{},"annotations":{}}}
* Connection #0 to host localhost left intact
* Closing connection 0
$ curl -X POST http://localhost:9000/v3/builds -d '{"package": {"guid": "c4795710-d70c-49ca-8820-ad70e8273e72"}}' -v
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9000 (#0)
> POST /v3/builds HTTP/1.1
> Host: localhost:9000
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 61
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 61 out of 61 bytes
< HTTP/1.1 201 Created
< Content-Type: application/json
< Date: Tue, 20 Jul 2021 23:30:04 GMT
< Content-Length: 414
<
{"guid":"3d1f625f-56c1-464f-b284-e8896d78ec5f","state":"","created_at":"2021-07-20T16:30:04-07:00","updated_at":"","lifecycle":{"type":"kpack","data":{"stack":"cflinuxfs3"}},"package":{"guid":"c4795710-d70c-49ca-8820-ad70e8273e72"},"droplet":{"guid":"NOT CURRENTLY IMPLEMENTED"},"relationships":{"app":{"data":{"guid":"1631cbaa-b8a2-4ca2-ae71-c52b7ebcb647"}}},"links":{},"metadata":{"labels":{},"annotations":{}}}
* Connection #0 to host localhost left intact
* Closing connection 0
This turned out to be a large adventure and I had to make a lot of changes (which I pushed to the branch):
After making all of these changes, I got to a place where the kpack build was succeeding for the dora app (which I chose because it has 2 processes in its Procfile), but the follow-up step to extract process types was failing with the following error:
2021-07-20T17:48:13.372-0700 INFO controller-runtime.manager.controller.droplet Attempting to reconcile default/droplet-896a55aa-2698-41af-b900-0fe775cb8876 {"reconciler group": "apps.cloudfoundry.org", "reconciler kind": "Droplet", "name": "droplet-896a55aa-2698-41af-b900-0fe775cb8876", "namespace": "default"}
2021-07-20T17:48:14.634-0700 INFO controller-runtime.manager.controller.droplet Error fetching image config: GET https://gcr.io/v2/cf-relint-greengrass/cf-crd-staging-spike/akira/63ae3e5c-eb17-4eb7-a32d-245f74868bba/manifests/sha256:3465fad4cb90b1d22f313dfd0ebaf8fb99138c8eb9fade2226e1785c4e7fc0b0: UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
{"reconciler group": "apps.cloudfoundry.org", "reconciler kind": "Droplet", "name": "droplet-896a55aa-2698-41af-b900-0fe775cb8876", "namespace": "default"}
2021-07-20T17:48:14.634-0700 INFO controller-runtime.manager.controller.droplet Error occurred extracting process types and commands: GET https://gcr.io/v2/cf-relint-greengrass/cf-crd-staging-spike/akira/63ae3e5c-eb17-4eb7-a32d-245f74868bba/manifests/sha256:3465fad4cb90b1d22f313dfd0ebaf8fb99138c8eb9fade2226e1785c4e7fc0b0: UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication {"reconciler group": "apps.cloudfoundry.org", "reconciler kind": "Droplet", "name": "droplet-896a55aa-2698-41af-b900-0fe775cb8876", "namespace": "default"}
I tracked this down to this line: https://github.com/cloudfoundry/cf-crd-explorations/blob/ee719db1cfd6455bdab3e949ed7ac34fa95bb4e9/controllers/droplet_controller.go#L81
We're assuming that the Droplet will have the imagePullSecrets that we need to access the registry, but the Droplet created by the API has no imagePullSecrets. This is the next problem to solve.
@heycait and I saw that the credentials used for pushing the image to app registry was not persisted on Package CR. We addressed this concern on commit - 39409bae8c00
We see that the build controller is using the above secret to fetch the app source image at line 158
But these changes seems to not have any effect. We still see the same error as above
2021-07-21T15:53:56.567-0600 INFO controller-runtime.manager.controller.droplet Attempting to reconcile default/droplet-86410e0c-6450-43dc-b26b-fa499bbc56c9 {"reconciler group": "apps.cloudfoundry.org", "reconciler kind": "Droplet", "name": "droplet-86410e0c-6450-43dc-b26b-fa499bbc56c9", "namespace": "default"}
2021-07-21T15:53:57.432-0600 INFO controller-runtime.manager.controller.droplet Error fetching image config: GET https://gcr.io/v2/cf-relint-greengrass/cf-crd-staging-spike/kpack/4b568bf3-9af1-42fc-9710-8f03dc6e4274/manifests/sha256:ee368c9f509c0b9ae885ec9fd088cb9504bcbd04b1d57396aa53d8b2f32652d7: UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
@akrishna90 and I found out that the issue was due to us using a generic
secret type instead of dockerconfigjson
. The Kpack auth helper only works with a specific set of secret types:
https://github.com/pivotal/kpack/blob/4e5647354c3a55fee655199646059efd93f9b0c4/pkg/dockercreds/k8sdockercreds/k8s_keychain.go#L94
Capturing the open question from @gnovv here : Are Docker Images OCI compliant? The question was in reference to lines written here https://github.com/cloudfoundry/cf-crd-explorations/blob/6b681ebc51411c2fe9c13e1cc45aa62125252786/controllers/droplet_controller.go#L108-L110
@tcdowney says - yeah, the code probably isn't robust enough. The process types metadata is not part of the OCI spec and just something that Cloud Native Buildpack-built images can reliably have. I'm not sure if our code handles the case where it's missing.
Docker images should have the ExposedPorts field, but that's also optional in the OCI spec so we can't rely on it and should handle the case where they're missing. I think the code probably handles it, but we didn't test that case.
Relevant OCI Image spec (search for ExposedPorts): https://github.com/opencontainers/image-spec/blob/main/config.md
This is a great point, though, for when we start implementing this for real. We'll want to have good automated tests for these cases since there are so many ways to build a container image.
Background
There are no explicit
Create
orDelete
endpoints in the CF API for Processes. Instead, they can come from two places:Note: In CF for VMs/CF for K8s there is no way to delete processes. You can only scale them down to
0
instances.CF for VMs Context
Staging
Most Processes are provided as a result of staging the app. Buildpacks will supply default process types for an app and users can provide the buildpack with a Procfile to define their own process types. The definitions (type and start command) for these processes are sent back to Cloud Controller as part of the staging completion callback and are saved in the database as part of the Droplet.
The Processes are created in the database when the user assigns the Droplet to the app (via the
PATCH /v3/apps/:guid/relationships/current_droplet
endpoint).App Manifests
Processes can also be defined in an App Manifest. When a manifest containing processes is sent to the CF API it will make Process records in the database for each one.
CF for K8s Context
Staging
Conceptually works the same as in CF for VMs. The implementation is different, though.
Kpack-built app images include process types and start command information as part of the container image metadata. The capi-k8s-release (kpack) Build Controller extracts process types from an image by calling out to the container registry to fetch its metadata.
This information is sent back to Cloud Controller via the
PATCH /v3/builds/:guid
endpoint and the droplet is updated.Note: Curiously it bypasses the
AppAssignDroplet
codepath in this case so I'm not sure how those processes get created in the database. It might be the case that this is a bug and it's covered up by the fact the fact the CLI always calls the "set current droplet" endpoint during a push.App Manifests
Same as CF for VMs.
What to do
Consider updating the Droplet CR and Controller
The CF Droplet object /
GET
endpoint includes information that we're not storing anywhere -- namely theprocess_types
field (but also maybebuildpacks
andstack
could be important).We may need to update the Droplet CR to store this information as well. We could update the Droplet Controller to do something similar to the capi-k8s-release Build controller and extract the process types whenever an image changes on a Droplet.
Figure out the best place to create Process CRs
For feature parity with CF we would do this whenever the
currentDropletRef
on an App changes or a manifest is applied (no spike code implements this yet).Think about where it makes the most sense to do this work in the spike code (App Controller reacting to
currentDropletRef
changes?) and implement it.Draw a diagram
Include a diagram that shows the chain of events that results in a Process being created from staging
Acceptance Criteria
GIVEN I have a cluster with the spike code deployed WHEN I stage an App (follow procedure outlined in https://github.com/cloudfoundry/cf-crd-explorations/issues/6) THEN I can view Process CRs that match the image processes.
Notes