cloudfoundry / cf-for-k8s

The open source deployment manifest for Cloud Foundry on Kubernetes
Apache License 2.0
300 stars 115 forks source link

Scale to 2000 App Instances #553

Open njbennett opened 3 years ago

njbennett commented 3 years ago

The CF API Kubernetes Evolution (CAKE) team has been validating that the platform can run 2000 AIs. This issue tracks known issues operating at that scale in our tests.

Known Issues

/v3/processes/<app_guid>/stats 503s

To serve this request, the Cloud Controller API server calls logcache for metric envelopes, then Eirini for container status. Both of these calls can fail when 2000 application instances are running.

Ultimately they both appear to be problems with the way that we're calling or configuring the Kubernetes component metric-server. @pianohacker and @Benjamintf1 from the CF logging and metrics team have been investigating.

Eirini memory usage

When running 10 instances of the Eirini components but with no other modifications from base cf-for-k8s, the Eirini tasks, events, and controller processes perpetually consume more memory than they're provisioned. This is probably a configuration problem.

Reported on the Eirini repo

Current tasks

[ ] Run CAPI acceptance tests against scaled-out system, to check for problems with other endpoints at that scale [ ] Document CAKE scale testing setup [ ] Review SAP scale testing & scale Eirini components in CAKE test system to match

Related Scale Issues

cf-gitbot commented 3 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/175469704

The labels on this github issue will be updated when the story is started.

Benjamintf1 commented 3 years ago

How many metric proxies were you running? They're pretty lightweight, but running more of them should help it scale(and metric server should be automatically scaling based on the node count.)