Retrieve application and pod statuses

devdattakulkarni commented 6 months ago

Consider the situation where KubePlus receives a request to create an application instance, but there is no available capacity on the cluster. In this case, KubePlus should deny such a request. Currently, KubePlus will handle the request but the application Pods will remain Pending if there is not enough available capacity on the cluster.

This feature will require two things:

First, we should require that Helm charts define requests and limits for their workload Pods. If both are defined that's great. Otherwise, at least requests need to be defined. We should flag an error if this is not the case.
Second, we should compare the available capacity (CPU and memory) on the cluster and what is being requested in the Helm chart. If the requested amounts are less than the available amounts then we should allow the request to go through.

Performing both checks in mutating webhook can be tricky since there is a strict 30-second timeout window for mutating webhook actions. One option can be to provide a kubectl plugin to perform the first check. However, the use of this plugin cannot be enforced. So probably the best place to perform the checks will still be the mutating webhook.

omgoswami commented 4 months ago

Seems like the applyPolicies() function (line 822) in mutating-webhook/webhook.go is currently processing such requests, reading the requested amounts of CPU and memory, storing them in patchOperation objects and returning a list of said objects. Can we make these checks in that function and only return this list if the checks pass?

If not there, then trackCustomAPIs() also seems to be reading these values and storing them in customAPIQuotaMap, and handleCustomAPIs() reads this map for these values, but neither forces the Helm chart to define values nor do they seem to deny the request if cluster is at capacity. Let me know where the correct place to write these checks is, and I will get started!

devdattakulkarni commented 4 months ago

@omgoswami Ack. Those are the right starting points. Let me look at the code and I will add more comments after that.

devdattakulkarni commented 4 months ago

@omgoswami So I thought about this issue more and I think that rather than adding any checks in KubePlus for resource requests and limits, we should just indicate Pod statues in the output of kubectl appresources kubectl plugin. This plugin discovers all the resources that have been created as part of a particular app deployment. We can add a 'status' column to the output and include the status of each resource.

See the main issue right now is there is no simple way to know whether the application Pods have started running or not. Users have to run kubectl get pods. We do have kubectl metrics plugin in whose output we do indicate how many Pods are running. But it is not intuitive that one needs to use kubectl metrics to find the status of the Pods. kubectl appresources is something that we already tell users to use to find out about all the resources that have been created. So it will be natural place to add the status for each resource.

Originally I was thinking that KubePlus should check the capacity and deny the requests. BUT from user point of view, it is not easy to decide what values to define for resource requests and limits. Moreover, if there is a Custom Resource as part of the application's helm chart and if that custom resource is creating Pods, then users may not have any control over the requests/limits for those Pods. Therefore, I think we should not go down the route of tracking and enforcing capacity. Instead, providing status output will be the right thing to do.

devdattakulkarni commented 4 months ago

@omgoswami As we discussed, lets add kubectl appstatus as a new plugin. Its inputs will be similar to that of kubectl appresources. The output will consists:

status retrieved from the application instance (e.g.: kubectl get WordPressService wp1 -o json -> status from this output) and
status retrieved from the application instance Pods (e.g.: kubectl get pods -n wp1 -> Pod statues from this output).

You can add kubectl-appstatus bash file as the entrypoint of the plugin. For retrieving statues, it might be easier to use python. Check kubectl appresources for how to invoke python script from within bash script. In fact, you can follow the kubectl appresources with similar error checks, etc.

cloud-ark / kubeplus

Retrieve application and pod statuses #1275