Closed cecchcc closed 3 weeks ago
Hi @cecchcc could you share:
Hello @samuel-esp, here is the different informations Our deployment:
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-api
namespace: orange
annotations:
downscaler/exclude: 'true'
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/version: 0.1.0
template:
metadata:
labels:
app.kubernetes.io/version: 0.1.0
heritage: Helm
spec:
containers:
- name: php
image: ******
- name: http
image: ******
---
# Scaled Object
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
annotations:
downscaler/downtime-replicas: '1'
downscaler/uptime: Mon-Fri 18:00-22:00 Europe/Paris
name: backend-api
namespace: orange
spec:
cooldownPeriod: 300
maxReplicaCount: 10
minReplicaCount: 1
pollingInterval: 30
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: backend-api
triggers:
- metadata:
metricName: active_processes
query: >
avg((sum(active_processes{job="orange"}) by
(kubernetes_pod_name) *100) /
sum(total_processes{job="orange"}) by
(kubernetes_pod_name))
serverAddress: *****
threshold: '50'
type: prometheus
The information of Kubedownscaler
# Kubedownscaler configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: py-kube-downscaler
namespace: kube-downscaler
data:
EXCLUDE_NAMESPACES: py-kube-downscaler,kube-downscaler,kube-system
---
# Kubedownscaler
apiVersion: apps/v1
kind: Deployment
metadata:
name: py-kube-downscaler
namespace: kube-downscaler
spec:
replicas: 1
selector:
matchLabels:
application: py-kube-downscaler
template:
metadata:
labels:
application: py-kube-downscaler
spec:
containers:
- name: py-kube-downscaler
image: ghcr.io/caas-team/py-kube-downscaler:24.8.0
args:
- '--interval=60'
- '--include-resources=deployments,statefulsets,scaledobjects'
- '--debug'
envFrom:
- configMapRef:
name: py-kube-downscaler
optional: true
resources:
limits:
cpu: 500m
memory: 900Mi
requests:
cpu: 200m
memory: 300Mi
securityContext:
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
serviceAccountName: kube-downscaler-py-kube-downscaler
serviceAccount: kube-downscaler-py-kube-downscaler
Here is a sample of the log concerning our deployment
2024-09-17 14:20:11,420 DEBUG: ScaledObject orange/backend-api has 1 replicas (original: None, uptime: Mon-Fri 18:00-22:00 Europe/Paris)
2024-09-17 14:21:17,520 DEBUG: Deployment orange/backend-api was excluded
2024-09-17 14:21:19,669 DEBUG: ScaledObject orange/backend-api has 1 replicas (original: None, uptime: Mon-Fri 18:00-22:00 Europe/Paris)
2024-09-17 14:22:25,452 DEBUG: Deployment orange/backend-api was excluded
Kube Downscaler does not give us more logs than that and it always say there is 1 replica even if there is more, for example here, we have 2 replicas for our backend-api but it still indicates 1.
Hi @cecchcc thank you for your answer, from what I'm understanding you are trying to downscale the deployment in this time interval Mon-Fri 18:00-22:00 Europe/Paris
. First of all you should delete this annotation from the deployment
downscaler/exclude: 'true'
keeping the annotation above means the Deployment will be excluded from downscaling, so you should delete that and replace with:
downscaler/uptime: Mon-Fri 18:00-22:00 Europe/Paris
For Keda Scaled Object this annotation downscaler/downtime-replicas
shouldn't be supported in the current release (but I will include it in the next release). So you should only keep this
downscaler/uptime: Mon-Fri 18:00-22:00 Europe/Paris
Can you try to test this configuration? (changing the time interval of course to now)
Looking from the time inside the logs, it is correct the workloads are not downscaled if you are currently targeting this time interval Mon-Fri 18:00-22:00 Europe/Paris
2024-09-17 14:20:11,420 DEBUG: ScaledObject orange/backend-api has 1 replicas (original: None, uptime: Mon-Fri 18:00-22:00 Europe/Paris)
2024-09-17 14:21:17,520 DEBUG: Deployment orange/backend-api was excluded
2024-09-17 14:21:19,669 DEBUG: ScaledObject orange/backend-api has 1 replicas (original: None, uptime: Mon-Fri 18:00-22:00 Europe/Paris)
2024-09-17 14:22:25,452 DEBUG: Deployment orange/backend-api was excluded
Kube Downscaler does not give us more logs than that and it always say there is 1 replica even if there is more, for example here, we have 2 replicas for our backend-api but it still indicates 1.
I see the concern for this log. Unfortunately the message is a "not elegant" way to say "Scaled Object is not downscaled yet", I will try to address this point with a new log message for keda in the next release
Hi @samuel-esp , thank you for your answer.
We tried with and without the downscaler/exclude: 'true'
and it does not scale in both cases.
If I understand well, we have to add the annotation downscaler/uptime: Mon-Fri 18:00-22:00 Europe/Paris
in the deployment and in the Keda ScaledObject manifests?
How can we specify the number of replicas if the downscaler/downtime-replicas
is not available?
Looking from the time inside the logs, it is correct the workloads are not downscaled if you are currently targeting this time interval Mon-Fri 18:00-22:00 Europe/Paris
I don't understand why as we specified here the uptime of the application and not the downtime, so we should have 2 replicas between 18:00-22:00 and between 22:00 to 18:00 only 1 replica. But currently, we still have 2 no matter what the time is.
I don't understand why as we specified here the uptime of the application and not the downtime, so we should have 2 replicas between 18:00-22:00 and between 22:00 to 18:00 only 1 replica. But currently, we still have 2 no matter what the time is.
Sorry, I just read the annotation in the wrong way (downscaler/downtime
), so you are correct both resources should be downscaled if you are using downscaler/uptime
in that interval. I'll try to check with a test cluster, replicating your situation
How can we specify the number of replicas if the downscaler/downtime-replicas is not available?
downscaler/downtime-replicas
is supported on Deployments but not supported on Keda Scaled Object, so the behavior I'm expecting to see is:
downscaler/downtime-replicas
valuedownscaler/downtime-replicas
), so also the deployment will be scaled to 0 because it is controlled by the Scaled ObjectYou should be able to make it work using this configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-api
namespace: orange
annotations:
downscaler/downtime-replicas: '1'
downscaler/uptime: Mon-Fri 18:00-22:00 Europe/Paris
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/version: 0.1.0
template:
metadata:
labels:
app.kubernetes.io/version: 0.1.0
heritage: Helm
spec:
containers:
- name: php
image: nginx
- name: http
image: nginx
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
annotations:
downscaler/uptime: Mon-Fri 18:00-22:00 Europe/Paris
name: backend-api
namespace: orange
spec:
cooldownPeriod: 300
maxReplicaCount: 10
minReplicaCount: 1
pollingInterval: 30
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: backend-api
triggers:
- metadata:
metricName: active_processes
query: >
avg((sum(active_processes{job="orange"}) by
(kubernetes_pod_name) *100) /
sum(total_processes{job="orange"}) by
(kubernetes_pod_name))
threshold: '50'
type: prometheus
The good news is that the behavior I wasn't expecting didn't happen
the behavior I'm expecting to see is:
- Deployment gets downscaled to downscaler/downtime-replicas value
- Scaled Object gets paused to 0 (because currently you can't specify a different value with downscaler/downtime-replicas ), so also the deployment will be scaled to 0 because it is controlled by the Scaled Object
So it is sufficient to specify downscaler/downtime-replicas: '1'
only in the deployment in order to achieve your configuration
Let me know if it helped you! I'll try to clarify logs and behaviors in a PR for the next release. Thank you a lot for raising this question
I tried to add the annotations on the deployment and the scaledobject but it is still not functionning well. In fact, the deployment scales down but Keda scales it up because it does not match its desired number of pods.
Here are the logs and we noticed an error with the autoscaling.keda.sh/paused-replicas
annotation
2024-09-18 13:55:23,675 DEBUG: Deployment orange/backend-api has 2 replicas (original: None, uptime: Mon-Fri 18:00-22:00 Europe/Paris)
2024-09-18 13:55:23,675 INFO: Scaling down Deployment orange/backend-api from 2 to 1 replicas (uptime: Mon-Fri 18:00-22:00 Europe/Paris, downtime: never)
2024-09-18 13:55:25,693 DEBUG: ScaledObject orange/backend-api has 1 replicas (original: None, uptime: Mon-Fri 18:00-22:00 Europe/Paris)
2024-09-18 13:55:25,693 ERROR: Failed to process ScaledObject orange/backend-api: 'autoscaling.keda.sh/paused-replicas'
Traceback (most recent call last):
File "/kube_downscaler/scaler.py", line 940, in autoscale_resource
scale_down(
File "/kube_downscaler/scaler.py", line 651, in scale_down
if resource.annotations[ScaledObject.keda_pause_annotation] is not None:
KeyError: 'autoscaling.keda.sh/paused-replicas'
``
Hi @cecchcc, the error you are facing now, was solved with #87, #91, #92. You should upgrade your installation at least to version v24.8.2
Hello @samuel-esp , We do not have the error anymore when we use the v24.8.2 and it downscales the deployment but the downscaler/downtime-replicas: '1' is not applied as it scales down the number of replicas to 0. Do you have any idea why?
Hi @cecchcc, it seems that you encountered the behavior i was suspecting. Then you shoudl wait until we add the official compatibility for "downscaler/downtime-replicas" inside ScaledObject annotation. Just give me some time, because I want to test it again, when I first tried to replicate your situation that behavior you are describing didn't happen. I just want to double check again
I managed to replicate agian your situation, and I can confirm the behavior you are describing it's happening. Wait for the next release and I'll include the support for downscaler/downtime-replicas
annotation
Hello @samuel-esp ,
I saw that there was a new release today. I tried it but we are getting this error now
2024-10-17 12:43:19,217 ERROR: Failed to process ScaledObject orange/backend-api: HTTPSConnectionPool(host='10.100.0.1', port=443): Read timed out. (read timeout=10)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
response = conn.getresponse()
File "/usr/local/lib/python3.10/site-packages/urllib3/connection.py", line 507, in getresponse
httplib_response = super().getresponse()
File "/usr/local/lib/python3.10/http/client.py", line 1375, in getresponse
response.begin()
File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.10/http/client.py", line 279, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/lib/python3.10/socket.py", line 717, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.10/ssl.py", line 1307, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/lib/python3.10/ssl.py", line 1163, in read
return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py", line 474, in increment
raise reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.10/site-packages/urllib3/util/util.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
response = self._make_request(
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 538, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 369, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.100.0.1', port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/kube_downscaler/scaler.py", line 990, in autoscale_resource
resource.update()
File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 165, in update
self.patch(self.obj, subresource=subresource)
File "/usr/local/lib/python3.10/site-packages/pykube/objects.py", line 150, in patch
r = self.api.patch(
File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 515, in patch
return self.session.patch(*args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 661, in patch
return self.request("PATCH", url, data=data, **kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.10/site-packages/pykube/http.py", line 181, in send
response = self._do_send(request, **kwargs)
File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 713, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='10.100.0.1', port=443): Read timed out. (read timeout=10)
Kube-downscaler can't downscale and upscale because of a read timeout. We did not have this issue before. Did you encounter this issue already?
We added these 2 annotations on the ScaledObject
and on the deployment
downscaler/downtime-replicas: '2'
downscaler/uptime: Mon-Fri 14:42-14:46 Europe/Paris
Hi @cecchcc are you running a managed Kubernetes (EKS, AKS, GKE) or self hosted? Are you encountering this error only for Scaled Object or also for other workloads?
At first sight it seems the Kubernetes API Server is throttling
apiVersion: apps/v1
kind: Deployment
metadata:
name: cron-scaling-deployment
namespace: orange
annotations:
downscaler/downtime-replicas: "1"
labels:
app: cron-scaling-app
spec:
replicas: 1
selector:
matchLabels:
app: cron-scaling-app
template:
metadata:
labels:
app: cron-scaling-app
spec:
containers:
- name: cron-scaling-container
image: nginx
ports:
- containerPort: 80
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cron-scaling-object
namespace: orange
annotations:
downscaler/downtime-replicas: "1"
labels:
app: cron-scaling-app
spec:
scaleTargetRef:
name: cron-scaling-deployment # The name of the deployment to scale
minReplicaCount: 3 # Minimum number of replicas
maxReplicaCount: 5 # Maximum number of replicas
triggers:
- type: cron
metadata:
timezone: Etc/UTC
start: "*/5 * * * *"
end: "1-59/5 * * * *"
desiredReplicas: "5"
This is the KubeDownscaler config
apiVersion: v1
data:
DEFAULT_DOWNTIME: Mon-Fri 10:00-19:00 CET
EXCLUDE_NAMESPACES: py-kube-downscaler,kube-downscaler,kube-system
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: my-py-kube-downscaler
meta.helm.sh/release-namespace: default
creationTimestamp: "2024-10-17T13:22:14Z"
labels:
app.kubernetes.io/managed-by: Helm
name: py-kube-downscaler
namespace: default
resourceVersion: "4295"
uid: 52af7720-6c00-4a29-94f4-7851f6443cd7
This is my log
2024-10-17 13:35:46,967 INFO: Downscaler vdev started with admission_controller=, debug=False, default_downtime=Mon-Fri 10:00-19:00 CET, default_uptime=always, deployment_time_annotation=None, downscale_period=never, downtime_replicas=0, dry_run=False, enable_events=False, exclude_deployments=py-kube-downscaler,kube-downscaler,downscaler, exclude_namespaces=py-kube-downscaler,kube-downscaler,kube-system, grace_period=0, include_resources=deployments,statefulsets,scaledobjects, interval=60, matching_labels=, namespace=, once=False, upscale_period=never, upscale_target_only=False
2024-10-17 13:35:47,234 INFO: Scaling down Deployment local-path-storage/local-path-provisioner from 1 to 0 replicas (uptime: always, downtime: Mon-Fri 10:00-19:00 CET)
2024-10-17 13:35:47,271 INFO: Scaling down Deployment orange/cron-scaling-deployment from 5 to 1 replicas (uptime: always, downtime: Mon-Fri 10:00-19:00 CET)
2024-10-17 13:35:47,596 INFO: Pausing ScaledObject orange/cron-scaling-object (uptime: always, downtime: Mon-Fri 10:00-19:00 CET)
We are using EKS. I tried using the downscaler with deployments and it is working fine
2024-10-17 13:46:17,555 INFO: Scaling down Deployment orange/test-downscaler from 5 to 2 replicas (uptime: Mon-Fri 15:34-15:36 Europe/Paris, downtime: never)
2024-10-17 13:51:58,718 INFO: Scaling up Deployment orange/test-downscaler from 2 to 5 replicas (uptime: Mon-Fri 15:51-15:55 Europe/Paris, downtime: never)
We only encounter the error with the scaledobject
Could you also share the configuration of KubeDownscaler deployment? How many workloads do you have inside your cluster that are in target of downscaling operation (more or less)? Are you using other workloads that needs to communicate a lot with the API Server?
Also it would be great if you could replicate the issue inside another Cluster. @Fovty @JTaeuber are you able guys to test it as well? Inside my test Cluster I wasn't able to reproduce this, I'll try soon with another one
We only have 1 workload, we are using mostly scaledObjects
so we want to validate that it works fine before using it on other workloads.
Here is the Kubedownscaler configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-downscaler-py-kube-downscaler
namespace: kube-downscaler
labels:
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/version: 24.10.1
application: kube-downscaler-py-kube-downscaler
argocd.argoproj.io/instance: kube-downscaler
helm.sh/chart: py-kube-downscaler-0.2.10
spec:
replicas: 1
selector:
matchLabels:
application: kube-downscaler-py-kube-downscaler
template:
metadata:
labels:
application: kube-downscaler-py-kube-downscaler
spec:
containers:
- name: py-kube-downscaler
image: ghcr.io/caas-team/py-kube-downscaler:24.10.1
args:
- '--interval=60'
- '--include-resources=deployments,statefulsets,scaledobjects'
Sorry for asking many questions: could you provide also the version you are using for both Kubernetes and Keda?
@JTaeuber @Fovty I saw PyKube was a little behind with dependencies so I opened caas-team/new-pykube#27 to bump dependencies there as well.
If I don't manage to replicate the issue tomorrow as well, I'll try to build a custom image for kube-downscaler with the new pykube version and I'll give it to @cecchcc to test it. It may be some weird network issue where pykube and kube-downscaler dependencies need to be perfectly aligned
The new release seems to work fine with older kubernetes version releases as well (just tested on 1.27 self managed). I will try to understand if it is EKS related
@cecchcc another test you could do on your side is to try to replicate the behavior inside another test cluster if you have it at your disposal
We are using Keda 2.15.1 and Kubernetes 1.30
@cecchcc can you join the slack in the docs? I'll give you some instruction to test a new image later this afternoon
Issue
Hello,
We deployed py-kube-downscaler with Helm on our cluster and wanted to use it with Keda ScaledObjects. We annotated the ScaledObject with
downscaler/downtime-replicas
anddownscaler/uptime
and we also tried to use the annotationdownscaler/exclude: "true"
on the deployment like it is written on the doc. But it does not have any effect, the number of pods are not downscaling.When deploying
py-kube-downscaler
we launched it with- '--include-resources=deployments,statefulsets,scaledobjects'
.Are we missing something?