Closed prashant-prodigal closed 1 year ago
Hi @prashant-prodigal - the error you are seeing at controller/src/main.rs:110
is related to the controller's metric server attempting to bind to the local loopback network and start serving metrics.
We are installing bottlerocket update operator in an EKS with no internet access.
Note that the Bottlerocket update operator requires network access to updates.bottlerocket.aws
: this is how update operator system queries for new OS updates. Read more about it here: https://github.com/bottlerocket-os/bottlerocket-update-operator#why-are-my-bottlerocket-nodes-egressing-to-httpsupdatesbottlerocketaws
Does your node have some network attached? In order for the prometheus server to come up, it'll at least need to be able to bind on 0.0.0.0
for IPv4 clusters or [::]
for IPv6 clusters.
Can you provide the full logs from the failed controller deployment?
kubectl logs -n brupop-bottlerocket-aws pod/brupop-controller-deployment-{YOUR-DEPLOYMENT}
Hello, I have allowed the URL https://updates.bottlerocket.aws still we are getting below error from command kubectl logs -n brupop-bottlerocket-aws pod/brupop-controller-deployment-{YOUR-DEPLOYMENT}
2023-03-10T04:36:28.369259Z INFO actix_server::builder: starting 2 workers at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/builder.rs:200
2023-03-10T04:36:28.369368Z ERROR controller: controller exited at controller/src/main.rs:110
What's the shape of your network? Are there any other logs in from the other update operator components?
I am seeing the same thing when my node has access to egress.
brupop-controller-deployment-875956b84-l42nf 0/1 CrashLoopBackOff 7 (2m32s ago) 13m
2023-04-11T20:56:31.570124Z INFO actix_server::builder: starting 1 workers
at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/builder.rs:200
2023-04-11T20:56:31.570208Z ERROR controller: controller exited
at controller/src/main.rs:110
With the deployment like
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: brupop-controller
app.kubernetes.io/managed-by: brupop
app.kubernetes.io/part-of: brupop
brupop.bottlerocket.aws/component: brupop-controller
name: brupop-controller-deployment
namespace: brupop-bottlerocket-aws
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
brupop.bottlerocket.aws/component: brupop-controller
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
brupop.bottlerocket.aws/component: brupop-controller
namespace: brupop-bottlerocket-aws
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- amd64
- arm64
containers:
- command:
- ./controller
env:
- name: MY_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: SCHEDULER_CRON_EXPRESSION
value: '* * * * * * *'
- name: MAX_CONCURRENT_UPDATE
value: "1"
image: public.ecr.aws/bottlerocket/bottlerocket-update-operator:v1.1.0
imagePullPolicy: IfNotPresent
name: brupop
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 3m
memory: 8Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
priorityClassName: brupop-controller-high-priority
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: brupop-controller-service-account
serviceAccountName: brupop-controller-service-account
terminationGracePeriodSeconds: 30
Good afternoon team,
Is there any further information regarding this issue? We're currently facing the same issue in an installation we have done this morning using operator version v1.1.0
$> kubectl logs deployment/brupop-controller-deployment --namespace brupop-bottlerocket-aws
2023-04-20T09:54:18.670695Z INFO actix_server::builder: starting 1 workers
at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/builder.rs:200
2023-04-20T09:54:18.670766Z INFO actix_server::server: Actix runtime found; starting in Actix runtime
at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/server.rs:196
2023-04-20T09:54:18.968337Z ERROR controller: controller exited
at controller/src/main.rs:110
It is deployed in a regular EKS cluster with no customizations. Services are configured to use IPv4 addresses. The current BottleRocket version is 1.12.0 and the only workloads currently installed besides the default ones are:
Let us know please if we can help with providing any other information.
For me this error happens when SCHEDULER_CRON_EXPRESSION is set and UPDATE_WINDOW_START & UPDATE_WINDOW_STOP are removed
If all three are present then the controller runs fine, though I am pretty sure my SCHEDULER_CRON_EXPRESSION is ignored
In my case I am rolling back to use of UPDATE_WINDOW_START & UPDATE_WINDOW_STOP to control update window.
Thanks for the tip @tmahalligan, we're going to give it a try!!
Thanks @tmahalligan. This has solved the problem and i am able to run the controller now. @jpmcb This might be a bug you would like to address? Also @jpmcb are UPDATE_WINDOW_START & UPDATE_WINDOW_STOP ignored when SCHEDULER_CRON_EXPRESSION is set?
Updated title to reflect what I think is the root issue here. Please correct me if I'm wrong.
Verified this is expected behavior when both a time window and a cron expression are provided:
This could be handled a little more gracefully though...
Edit: Actually... that is the opposite of what is noted above:
For me this error happens when SCHEDULER_CRON_EXPRESSION is set and UPDATE_WINDOW_START & UPDATE_WINDOW_STOP are removed
If all three are present then the controller runs fine, though I am pretty sure my SCHEDULER_CRON_EXPRESSION is ignored
More investigation needed then.
@tmahalligan Hi. what version of bottlerocket update operator container were you using? I think it might because you were using the latest version yaml file but still use the old bottlerocket update operator. cron scheduler
is a new feature which we will introduce in next release, so the errors on the controller could be related to the system still need time window
but cron expression scheduler
provided. Can you try to use this yaml file? thanks!
Am using v1.1.0 here is relevant config @gthao313
containers:
@tmahalligan yeah, v1.1.0 doesn't have SCHEDULER_CRON_EXPRESSION
, and we plan to release v1.2.0 later which will introduce cron scheduler
. Currently, can you remove SCHEDULER_CRON_EXPRESSION
from the config and everything should be work. This is the v1.1.0 config. : )
I was under the impression from the documentation https://github.com/bottlerocket-os/bottlerocket-update-operator#set-scheduler that the released version of the Operator supported the cron functionality. I assume others may have made same mistake perhaps the docs should be amended. @gthao313
Thanks for the follow-up will adjust and wait on next release
Also note that we attach the relevant configs to the release for each version: https://github.com/bottlerocket-os/bottlerocket-update-operator/releases/tag/v1.1.0
We are installing bottlerocket update operator in an EKS with no internet access. But the operator deployment starts failing, its giving this error:
2023-03-09T13:12:28.369373Z INFO actix_server::builder: starting 2 workers at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/builder.rs:200
2023-03-09T13:12:28.369436Z ERROR controller: controller exited at controller/src/main.rs:110
I am using Latest image as per the docs. Could you pls point me to what could be wrong here?