Open togikiran opened 2 months ago
Hi @togikiran ,
Why the vmagent pods are getting rotated whenever we are increasing the replica-count ? Command used: kubectl scale vmagent-shard-ha --replicas=3
What do you mean by pods are getting rotated
, could you share the pods status?
If you scale vmagent using kubectl scale vmagent/vmagent-shard-ha --replicas=3
from replicas=1, the expected behavior is new pods created, old pod stay.
What the above hpa scaling metric is showing unknow
It's unknown because metrics server doesn't know what's vmagent's cpu utilization now, you need to create the metric and report to metrics server.
Issue: missing label selector status
That's a bug, since we don't have label propagation in vmagent.status
now. We can add them, but I'm not sure if that's useful. Like in this hpa case, hpa doesn't have to know the pod label selector, it only need to scale the vmagent.shardCount, and operator will scale pods.
Hey @Haleygo
When i increased the replicas from 2->3 , the older pods got rolledout and new pods came. Shared the screenshots
It's unknown because metrics server doesn't know what's vmagent's cpu utilization now, you need to create the metric and report to metrics server.
Default metric-server knows the pods metrics right, do you mean we need to add for vmagent customresource as well ? If yes can you help with the approach ?
Can you please share a sample hpa yaml file (k8s) for autoscaling vmagents shards based on cpu.
Thanks
Hello,
Due to current sharding implementation of vmagent, all flags for the all vmagents must be changed. It requires restart of all pods with new flag value.
Hey @f41gh7 , Is there a way to skip or bypass the pods restart because this will impact and restart for every scaleup. Is there any mitigation inplace for this issue ?
Thanks
Is there a way to skip or bypass the pods restart because this will impact and restart for every scaleup
No, it won't work if pod doesn't restart with new -promscrape.cluster.membersCount
arg.
Imaging you have vmagent with shardCount: 1
and set hpa scale threshold to cpu>80%
. At first it scrapes 100 targets with -promscrape.cluster.membersCount=1 -promscrape.cluster.memberNum=0
.
Then targets number bumps to 200 and cpu exceeds 80%, the hpa helps increase the vmagent. shardCount
to 2, which means there will be two vmagent instances sharding 200 targets, each instance still scrape 100 targets with -promscrape.cluster.membersCount=2 -promscrape.cluster.memberNum=0 or 1
. In this way, cpu usage will go down.
If we don't change the -promscrape.cluster.membersCount
for each instance, both of them will scrape 200 targets, cpu usage won't go down and there is no point to have hpa.
Default metric-server knows the pods metrics right, do you mean we need to add for vmagent customresource as well ? If yes can you help with the approach ?
I'd recommend to use keda here. It can use prometheus as direct trigger, like this
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: test
spec:
scaleTargetRef:
kind: vmagent
name: vmagent-test
minReplicaCount: 2
maxReplicaCount: 3
triggers:
- type: prometheus
metadata:
serverAddress: http://vmselect-address
metricName: vmagent-cpu-usage
threshold: '80'
query: container_cpu_usage{you-container-here}
Issue: missing label selector status
jfyi, status label selector should be fixed in https://github.com/VictoriaMetrics/operator/commit/a6e3ad72ce6bb78bea5b45f9e7513f5ab3b996d0.
@Haleygo observed metric loss during vmagents pod scaleup i.e all pods are getting recreated after increase in replica count. This is impacting production clusters. Is there any workaround for this ?
Observing metric loss while vmagent scaling. Added hpa on cpu and memory metrics, pods are getting rolled out and observing metric loss. Production clusters are impacted Operator: v0.44.0 Vmagent version: v1.90.0 @f41gh7
@Haleygo observed metric loss during vmagents pod scaleup i.e all pods are getting recreated after increase in replica count. This is impacting production clusters. Is there any workaround for this ?
hmm, I'm afraid that's expected with current implementation, workaround would be also set vmagentSpec.replicaCount=2
, and enable deduplication in vmcluster.
@f41gh7 Why the vmagent pods are getting rotated whenever we are increasing the replica-count ?
Command used: kubectl scale vmagent-shard-ha --replicas=3
How should we configure hpa to scale based on cpu/memory utilisation ?
What the above hpa scaling metric is showing unknow
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE vmagent-shard-ha VMAgent/vmagent-shard-ha <unknown>/80% 3 3 3 14m
Issue: missing label selector status
kubectl get --raw /apis/operator.victoriametrics.com/v1beta1/namespaces/<namespace>/vmagents/monitoring-vmagent-ha/scale {"kind":"Scale","apiVersion":"autoscaling/v1","metadata":{"name":"monitoring-vmagent-ha","namespace":"<namespace>","uid":"d80e7371-cd49-4caa-8765-3a78220f9543","resourceVersion":"351547683","creationTimestamp":"2024-04-17T12:26:29Z"},"spec":{"replicas":3},"status":{"replicas":3}}
vm-operator version: v0.30.0 vmagent: v1.90.0