Open oodiete opened 5 years ago
Hey @oodiete, thanks for raising the issue. I'd like to investigate this further but will require you to open up a support ticket so you can send a flare from the cluster-agent. This should let us inspect your logs and configurations more closely.
In your ticket, please reference this Github issue and attach the output of kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/<your-namespace>/rabbitmq.queue.messages | jq
would also be helpful. If you're using any label selectors for this metric, just append ?labelSelector=<label>
in the command, after the metric name. Thanks! Let me know if there are questions.
@DylanLovesCoffee I updated the ticket with more info. I will also email the issue number to support.
{
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/xxxxxx/rabbitmq.queue.messages"
},
"items": [
{
"metricName": "rabbitmq.queue.messages",
"metricLabels": {
"rabbitmq_queue": "event_bus.case_management"
},
"timestamp": "2019-08-28T14:42:56Z",
"value": "33500m"
}
]
}
Hey @oodiete, thanks for updating the issue with your information! The values reported by the autoscaler suffixed with m
represents milli (1/1000). The cluster-agent was built to support this with https://github.com/DataDog/datadog-agent/pull/3090 (>= cluster-agent v1.3.0)and will carry the conversion over into the agent status
as a float type when it's deemed necessary, so we should not expect any differences between the two values when scaling.
@DylanLovesCoffee thanks for the response although I don't think that solves my problem because the hpa does not understand the m
and seems to see for example 329500m
as a very very large value so the scaling behaviour is it scales really up when it get 329500m
and down when it gets something like 329.500
even though they should be the same.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
assortment-event-listener Deployment/assortment-event-listener 329500m/50 1 8 8 14h
@oodiete, I facing the same issue, did you able to solve this?
Hi all,
Let me know if I am misunderstanding the issue, but I believe this is just a data representation quirck and is not impacting the actual lifecycle of this feature.
The m
is one of the visual representations that is used by the Quantity type, this type is used throughout the codebase and if you want to implement the API interfaces.
For instance, the Cluster Agent, implementing the External Metrics API's interface and registered as the server here, returns the ExternalMetricValueList
type, per the function GetExternalMetric.
As you can see in the Kubernetes library, the Value is a Quantity and used as such in the code of the Horizontal Pod Autoscaler Controller.
Now, when describing HPAs, as you can see here the status of HPAs gives the values with the Quantity type (aka, m
) for the AverageValues (that is used for the External Metrics types) and an integer for AverageUtilization.
Per the doc on quantities:
// Before serializing, Quantity will be put in "canonical form".
// This means that Exponent/suffix will be adjusted up or down (with a
// corresponding increase or decrease in Mantissa) such that:
// - No precision is lost
// - No fractional digits will be emitted
// - The exponent (or suffix) is as large as possible.
//
// The sign will be omitted unless the number is negative.
// Examples:
// - 1.5 will be serialized as "1500m"
// - 1.5Gi will be serialized as "1536Mi"
The above is really geared towards addressing the following statement:
because the hpa does not understand the m and seems to see for example 329500m as a very very large value
For the HPA controller, 329500m=329.5. From the status, the reason there are scaling events is because the threshold is at 50 (though I might be missing context).
Lastly to address one other potential misunderstanding, I see that the value from the Cluster Agent status is used to compare the values representation. This is just how we chose to represent the value (as Dylan explained), per his template and the humanize
function here
Output of the info page (if this is a bug)
Describe what happened:
Notice the precision of the metric in the result from
datadog-cluster-agent status
ran on the cluster-agent, and notice the precision after runningkubectl get hpa
andkubectl describe hpa
. The worse part is sometimes the precision is good, and we get the same values on both side (when it is an integer), so imagine the autoscaler seeing 329500m as some point and then 300 at another and scaling up and down.Describe what you expected:
I expected 329.5.
Steps to reproduce the issue:
Additional environment details (Operating System, Cloud provider, etc):