Open njbennett opened 4 years ago
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/175469704
The labels on this github issue will be updated when the story is started.
How many metric proxies were you running? They're pretty lightweight, but running more of them should help it scale(and metric server should be automatically scaling based on the node count.)
The CF API Kubernetes Evolution (CAKE) team has been validating that the platform can run 2000 AIs. This issue tracks known issues operating at that scale in our tests.
Known Issues
/v3/processes/<app_guid>/stats
503sTo serve this request, the Cloud Controller API server calls logcache for metric envelopes, then Eirini for container status. Both of these calls can fail when 2000 application instances are running.
Ultimately they both appear to be problems with the way that we're calling or configuring the Kubernetes component
metric-server.
@pianohacker and @Benjamintf1 from the CF logging and metrics team have been investigating.Eirini memory usage
When running 10 instances of the Eirini components but with no other modifications from base cf-for-k8s, the Eirini tasks, events, and controller processes perpetually consume more memory than they're provisioned. This is probably a configuration problem.
Reported on the Eirini repo
Current tasks
[ ] Run CAPI acceptance tests against scaled-out system, to check for problems with other endpoints at that scale [ ] Document CAKE scale testing setup [ ] Review SAP scale testing & scale Eirini components in CAKE test system to match
Related Scale Issues
/v3/processes/<app_guid>/stats
fails when many app instances are in the same space #544