Istio-Proxy sidecar resource requirements overly excessive

cloudfoundry / cf-for-k8s

The open source deployment manifest for Cloud Foundry on Kubernetes

Apache License 2.0

300 stars 115 forks source link

Istio-Proxy sidecar resource requirements overly excessive #422

Open JamesClonk opened 4 years ago

JamesClonk commented 4 years ago

Describe the bug

The resource requirements configured for the Istio-Proxy sidecar for app instances seems to be a rather excessive, especially when compared to pushing a small golang app for example. In our environment this has caused apps to be unschedulable due to resource constraints from the K8s scheduler/nodes.

To Reproduce*

Steps to reproduce the behavior:

cf push my-small-golang-app -m 16m

kubectl -n cf-workloads describe pod/<app-instance-pod>

opi:
Limits:
  ephemeral-storage:  64M
  memory:             16M
Requests:
  cpu:                10m
  ephemeral-storage:  64M
  memory:             16M

istio-proxy:
Limits:
  cpu:     2
  memory:  1Gi
Requests:
  cpu:      100m
  memory:   128Mi

Expected behavior

Istio-Proxy should not have such excessive resource requests/limits set, when compared to an app that itself only requests 10m,16Mi itself.

cf-for-k8s SHA

https://github.com/cloudfoundry/cf-for-k8s/tree/7c65597af7a4de935994813658a5db182fbecac9

Cluster information

PKS

cf-gitbot commented 4 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/174710760

The labels on this github issue will be updated when the story is started.

mike1808 commented 4 years ago

Hello @JamesClonk. Thanks for raising this issue.

We understand that you might have a small TKGI (PKS) cluster and that you only deploy apps with low memory usage, however, the sidecar proxies memory usage doesn't correlate with the memory usage of the app itself but rather traffic going to and from the app. So we cannot change that number based on the memory usage of the app.

Also, just to remind you that right now cf-for-k8s has some minimal system requirements:

To deploy cf-for-k8s as is, the cluster should:

be running Kubernetes version within range 1.16.x to 1.18.x

have a minimum of 5 nodes

have a minimum of 4 CPU, 15GB memory per node

You can read more about it in the deployment guide.

cc @kauana

jamespollard8 commented 4 years ago

Thanks @JamesClonk for submitting this issue and to @mike1808 and @kauana for your response.

Mike and Kauana, we have a couple of questions for you: 1) is this networking story related? Platform operators can configure Istio component resource properties 2) Would you recommend we keep this issue open for now or that we close it?

mike1808 commented 4 years ago

Hi @jamespollard8

Yes, it's related. We're going to allow operators to modify Istio resource request/limits.
Yes, let's keep this open and mark as a known issue.

loewenstein commented 4 years ago

I have some doubts that allowing the platform operator to configure resource requirements for sidecars globally will solve the. At least unless we have a foundation that is only hosting apps with very similar network traffic.

Has any conceptual work been started of how we can scale the Envoy according to the applications needs? I am aware that this will be far from trivial to solve and might even require work in kubernetes (first class sidecar) or istio (there at least had been ideas about how to decouple envoy from the application pods). Just curious if there have been any thoughts on this in the cf-k8s-networking team.

mike1808 commented 4 years ago

Hello @loewenstein

We personally didn't perform any tests to validate resource requirements for sidecars and for now we're going to rely on the numbers from Istio documentaiton.

The Envoy proxy uses 0.5 vCPU and 50 MB memory per 1000 requests per second going through the proxy.

Istiod uses 1 vCPU and 1.5 GB of memory.

The Envoy proxy adds 2.76 ms to the 90th percentile latency.

loewenstein commented 4 years ago

I was just saying that "per 1000 requests" will not make for an easy platform wide configuration.

But I do understand that we currently don't have much of an option.

mike1808 commented 4 years ago

@loewenstein we are going to make a doc with our recommendation (based on our testing). However, it is not prioritized right now.

nickb937 commented 2 years ago

The workaround would be to manually override the envoy memory requirements in your pod template:

    metadata:
      annotations:
        sidecar.istio.io/proxyCPU: "100m"
        sidecar.istio.io/proxyCPULimit: "1000m"
        sidecar.istio.io/proxyMemory: "1Gi"
        sidecar.istio.io/proxyMemoryLimit: "2Gi"

Mohid-A commented 2 years ago

Hi @mike1808 , we are in a similar situation trying to investigate the right compute resources allocation to the envoy proxy sidecar, as the workloads increase, it's taking the cluster into the overcommitted state. Are there any recommendations, you have published in regards to resource allocation?