When I was performing some resource allocation adjustments in the shared services environment this morning I was looking for actual CPU and memory use metrics over the past few weeks, I noticed our dashboards only graph CPU and memory use as a percentage of the configured limit. Though this is helpful for detecting and diagnosing issues it is not ideal for making decisions with regard to evaluating and selecting the best resource request setting for a given pod.
Please add actual CPU and memory use to each container's graph on the dashboards. This will likely have to be presented as an average over the number of instances that are running. Although a view of max and min for both memory and CPU could help detect whether load is being evenly distributed across the instances.
For bonus points. If possible, add a graph to each environment's dashboard which graphs the CPU and memory requests and limits against the quota of that environment, so the dashboards have a graph showing the data found on the compute-long-running-quota console page of the corresponding environment. This ensures all the CPU and memory metrics needed to make resource planning decisions can be found in one place.
When I was performing some resource allocation adjustments in the shared services environment this morning I was looking for actual CPU and memory use metrics over the past few weeks, I noticed our dashboards only graph CPU and memory use as a percentage of the configured limit. Though this is helpful for detecting and diagnosing issues it is not ideal for making decisions with regard to evaluating and selecting the best resource request setting for a given pod.
Please add actual CPU and memory use to each container's graph on the dashboards. This will likely have to be presented as an average over the number of instances that are running. Although a view of max and min for both memory and CPU could help detect whether load is being evenly distributed across the instances.
For bonus points. If possible, add a graph to each environment's dashboard which graphs the CPU and memory requests and limits against the quota of that environment, so the dashboards have a graph showing the data found on the
compute-long-running-quota
console page of the corresponding environment. This ensures all the CPU and memory metrics needed to make resource planning decisions can be found in one place.