Closed sofiasikandar123 closed 2 years ago
Can you post here the logs from the previous container instance, kubectl logs --previous
Here they are. Thoughts?
$ kubectl logs kustomize-controller-7cfb84c5fd-bg94s --previous -n flux-system
{"level":"info","ts":"2022-09-15T14:57:33.854Z","logger":"controller-runtime.metrics","msg":"Metrics server
is starting to listen","addr":":8080"}
{"level":"info","ts":"2022-09-15T14:57:33.856Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2022-09-15T14:57:33.857Z","msg":"Starting server","path":"/metrics","kind":"metrics",
"addr":"[::]:8080"}
{"level":"info","ts":"2022-09-15T14:57:33.858Z","msg":"Starting server","kind":"health probe","addr":"[::]:
9440"}
I0915 14:57:33.959090 6 leaderelection.go:248] attempting to acquire leader lease flux-system/kustomi
ze-controller-leader-election...
I0915 14:58:16.309400 6 leaderelection.go:258] successfully acquired lease flux-system/kustomize-cont
roller-leader-election
{"level":"info","ts":"2022-09-15T14:58:16.309Z","logger":"controller.kustomization","msg":"Starting EventSo
urce","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind sou
rce: *v1beta2.Kustomization"}
{"level":"info","ts":"2022-09-15T14:58:16.309Z","logger":"controller.kustomization","msg":"Starting EventSo
urce","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind sou
rce: *v1beta2.OCIRepository"}
{"level":"info","ts":"2022-09-15T14:58:16.309Z","logger":"controller.kustomization","msg":"Starting EventSo
urce","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind sou
rce: *v1beta2.GitRepository"}
{"level":"info","ts":"2022-09-15T14:58:16.309Z","logger":"controller.kustomization","msg":"Starting EventSo
urce","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","source":"kind sou
rce: *v1beta2.Bucket"}
{"level":"info","ts":"2022-09-15T14:58:16.309Z","logger":"controller.kustomization","msg":"Starting Control
ler","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization"}
{"level":"info","ts":"2022-09-15T14:58:16.410Z","logger":"controller.kustomization","msg":"Starting workers
","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","worker count":4}
{"level":"info","ts":"2022-09-15T14:58:16.411Z","logger":"controller.kustomization","msg":"All dependencies
are ready, proceeding with reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config10-data-api","namespace":"main-config10"}
{"level":"info","ts":"2022-09-15T14:58:16.411Z","logger":"controller.kustomization","msg":"All dependencies
are ready, proceeding with reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config4-frontend","namespace":"main-config4"}
{"level":"info","ts":"2022-09-15T14:58:16.445Z","logger":"controller.kustomization","msg":"Dependencies do
not meet ready condition, retrying in 30s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kin
d":"Kustomization","name":"main-config5-frontend","namespace":"main-config5"}
{"level":"info","ts":"2022-09-15T14:58:17.241Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"mai
n-config4-mongodb","namespace":"main-config4","output":{"Deployment/default/mongodb":"unchanged","Service/d
efault/database":"unchanged"}}
{"level":"info","ts":"2022-09-15T14:58:17.270Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"mai
n-config10-data-api","namespace":"main-config10","output":{"Deployment/default/data-api":"configured","Serv
ice/default/data-api":"configured"}}
{"level":"info","ts":"2022-09-15T14:58:17.398Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"mai
n-config-mongodb","namespace":"main-config","output":{"Deployment/default/mongodb":"configured","Service/de
fault/database":"configured"}}
{"level":"info","ts":"2022-09-15T14:58:17.475Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"mai
n-config4-frontend","namespace":"main-config4","output":{"Deployment/default/frontend":"unchanged","Ingress
{"level":"info","ts":"2022-09-15T14:58:17.491Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.079400252s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config4-mongodb","namespace":"main-config4","revision":"external-ip-issue/
f4af09260f726795fdcc5be3ddef9e3bc3882567"}
{"level":"info","ts":"2022-09-15T14:58:17.493Z","logger":"controller.kustomization","msg":"All dependencies
are ready, proceeding with reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config4-data-api","namespace":"main-config4"}
{"level":"info","ts":"2022-09-15T14:58:17.528Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.117169472s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config10-data-api","namespace":"main-config10","revision":"external-ip-iss
ue/f4af09260f726795fdcc5be3ddef9e3bc3882567"}
{"level":"info","ts":"2022-09-15T14:58:17.654Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.192463409s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config-mongodb","namespace":"main-config","revision":"external-ip-issue/f4
af09260f726795fdcc5be3ddef9e3bc3882567"}
{"level":"info","ts":"2022-09-15T14:58:17.657Z","logger":"controller.kustomization","msg":"All dependencies
are ready, proceeding with reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config-data-api","namespace":"main-config"}
{"level":"info","ts":"2022-09-15T14:58:17.685Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.273614025s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config4-frontend","namespace":"main-config4","revision":"external-ip-issue
/f4af09260f726795fdcc5be3ddef9e3bc3882567"}
{"level":"info","ts":"2022-09-15T14:58:18.352Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"mai
n-config4-data-api","namespace":"main-config4","output":{"Deployment/default/data-api":"configured","Servic
e/default/data-api":"configured"}}
{"level":"info","ts":"2022-09-15T14:58:18.433Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"smi
lr-mongodb","namespace":"smilr","output":{"Deployment/default/mongodb":"configured","Service/default/databa
se":"configured"}}
{"level":"info","ts":"2022-09-15T14:58:18.560Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"mai
n-config-data-api","namespace":"main-config","output":{"Deployment/default/data-api":"configured","Service/
default/data-api":"configured"}}
{"level":"info","ts":"2022-09-15T14:58:18.567Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.073585572s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config4-data-api","namespace":"main-config4","revision":"external-ip-issue
/f4af09260f726795fdcc5be3ddef9e3bc3882567"}
{"level":"info","ts":"2022-09-15T14:58:18.568Z","logger":"controller.kustomization","msg":"All dependencies
are ready, proceeding with reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config10-frontend","namespace":"main-config10"}
{"level":"info","ts":"2022-09-15T14:58:18.648Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.107941545s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"smilr-mongodb","namespace":"smilr","revision":"external-ip/00e82d39c28b829dbd0c
0533b683aa8d691db736"}
{"level":"info","ts":"2022-09-15T14:58:18.678Z","logger":"controller.kustomization","msg":"Dependencies do
not meet ready condition, retrying in 30s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kin
d":"Kustomization","name":"main-config5-data-api","namespace":"main-config5"}
{"level":"info","ts":"2022-09-15T14:58:18.680Z","logger":"controller.kustomization","msg":"All dependencies
are ready, proceeding with reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"smilr-data-api","namespace":"smilr"}
{"level":"info","ts":"2022-09-15T14:58:18.725Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.067581889s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config-data-api","namespace":"main-config","revision":"external-ip-issue/f
4af09260f726795fdcc5be3ddef9e3bc3882567"}
{"level":"info","ts":"2022-09-15T14:58:18.730Z","logger":"controller.kustomization","msg":"All dependencies
are ready, proceeding with reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config-frontend","namespace":"main-config"}
{"level":"info","ts":"2022-09-15T14:58:19.509Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"smi
lr-data-api","namespace":"smilr","output":{"Deployment/default/data-api":"configured","Service/default/data
-api":"configured"}}
{"level":"info","ts":"2022-09-15T14:58:19.682Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"mai
n-config10-frontend","namespace":"main-config10","output":{"Deployment/default/frontend":"configured","Ingr
ess/default/smilr":"configured","Service/default/frontend":"configured"}}
{"level":"info","ts":"2022-09-15T14:58:19.721Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.041445431s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"smilr-data-api","namespace":"smilr","revision":"external-ip/00e82d39c28b829dbd0
c0533b683aa8d691db736"}
{"level":"info","ts":"2022-09-15T14:58:19.923Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.355282249s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config10-frontend","namespace":"main-config10","revision":"external-ip-iss
ue/f4af09260f726795fdcc5be3ddef9e3bc3882567"}
{"level":"info","ts":"2022-09-15T14:58:19.925Z","logger":"controller.kustomization","msg":"All dependencies
are ready, proceeding with reconciliation","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"smilr-frontend","namespace":"smilr"}
{"level":"info","ts":"2022-09-15T14:58:19.967Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"mai
n-config-frontend","namespace":"main-config","output":{"Deployment/default/frontend":"configured","Ingress/
default/smilr":"configured","Service/default/frontend":"configured"}}
{"level":"info","ts":"2022-09-15T14:58:20.047Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.316768619s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config-frontend","namespace":"main-config","revision":"external-ip-issue/f
4af09260f726795fdcc5be3ddef9e3bc3882567"}
{"level":"info","ts":"2022-09-15T14:58:20.542Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"mai
n-config10-mongodb","namespace":"main-config10","output":{"Deployment/default/mongodb":"configured","Servic
e/default/database":"configured"}}
I0915 14:58:21.046324 6 request.go:601] Waited for 1.090126399s due to client-side throttling, not pr
iority and fairness, request: GET:https://10.96.0.1:443/apis/source.toolkit.fluxcd.io/v1beta2?timeout=32s
{"level":"info","ts":"2022-09-15T14:58:21.103Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 1.374601414s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"main-config10-mongodb","namespace":"main-config10","revision":"external-ip-issu
e/f4af09260f726795fdcc5be3ddef9e3bc3882567"}
{"level":"info","ts":"2022-09-15T14:58:21.203Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"smi
lr-frontend","namespace":"smilr","output":{"Namespace/default":"unchanged"}}
{"level":"info","ts":"2022-09-15T14:58:21.861Z","logger":"controller.kustomization","msg":"server-side appl
y completed","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"smi
lr-frontend","namespace":"smilr","output":{"ClusterRole/ingress-nginx":"unchanged","ClusterRole/ingress-ngi
nx-admission":"unchanged","ClusterRoleBinding/ingress-nginx":"unchanged","ClusterRoleBinding/ingress-nginx-
admission":"unchanged","ConfigMap/default/ingress-nginx-controller":"unchanged","Deployment/default/fronten
d":"configured","Deployment/default/ingress-nginx-controller":"unchanged","Ingress/default/smilr":"configur
ed","IngressClass/nginx":"unchanged","Job/default/ingress-nginx-admission-create":"unchanged","Job/default/
ingress-nginx-admission-patch":"unchanged","Role/default/ingress-nginx":"unchanged","Role/default/ingress-n
ginx-admission":"unchanged","RoleBinding/default/ingress-nginx":"unchanged","RoleBinding/default/ingress-ng
inx-admission":"unchanged","Service/default/frontend":"configured","Service/default/ingress-nginx-controlle
r":"unchanged","Service/default/ingress-nginx-controller-admission":"unchanged","ServiceAccount/default/ing
ress-nginx":"unchanged","ServiceAccount/default/ingress-nginx-admission":"unchanged","ValidatingWebhookConf
iguration/ingress-nginx-admission":"unchanged"}}
{"level":"info","ts":"2022-09-15T14:58:22.043Z","logger":"controller.kustomization","msg":"Reconciliation f
inished in 2.118346948s, next run in 10m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler ki
nd":"Kustomization","name":"smilr-frontend","namespace":"smilr","revision":"external-ip/00e82d39c28b829dbd0
c0533b683aa8d691db736"}
I see no panic nor other errors in the logs you posted, something else must be going on. Can please post here:
kubectl -n flux-system get deployment kustomize-controller -oyaml
kubectl -n flux-system describe deployment kustomize-controller
Here you go! Thanks for your quick response.
$ kubectl -n flux-system get deployment kustomize-controller -oyaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
meta.helm.sh/release-name: flux
meta.helm.sh/release-namespace: flux-system
creationTimestamp: "2022-08-23T20:30:23Z"
generation: 2
labels:
app.kubernetes.io/component: kustomize-controller
app.kubernetes.io/instance: flux
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: microsoft.flux
app.kubernetes.io/part-of: flux
app.kubernetes.io/version: v0.33.0
clusterconfig.azure.com/extension-version: 1.6.0
control-plane: controller
name: kustomize-controller
namespace: flux-system
resourceVersion: "5607736"
uid: 50b552ef-7c39-4fa2-ab17-a2630fcbd567
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: kustomize-controller
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: kustomize-controller
app.kubernetes.io/name: microsoft.flux
spec:
containers:
- args:
- --events-addr=http://notification-controller.$(RUNTIME_NAMESPACE).svc.cluster.local./
- --watch-all-namespaces=true
- --log-level=info
- --log-encoding=json
- --enable-leader-election
- --no-cross-namespace-refs=true
- --no-remote-bases=true
- --default-service-account=flux-applier
env:
- name: RUNTIME_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: mcr.microsoft.com/oss/fluxcd/kustomize-controller:v0.27.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: healthz
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: manager
ports:
- containerPort: 9440
name: healthz
protocol: TCP
- containerPort: 8080
name: http-prom
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: healthz
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: 100m
memory: 64Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: temp
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1337
serviceAccount: kustomize-controller
serviceAccountName: kustomize-controller
terminationGracePeriodSeconds: 60
tolerations:
- key: CriticalAddonsOnly
operator: Exists
volumes:
- emptyDir: {}
name: temp
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2022-08-23T20:34:26Z"
lastUpdateTime: "2022-09-06T19:38:22Z"
message: ReplicaSet "kustomize-controller-7cfb84c5fd" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2022-09-15T15:09:19Z"
lastUpdateTime: "2022-09-15T15:09:19Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 2
readyReplicas: 1
replicas: 1
updatedReplicas: 1
$ kubectl -n flux-system describe deployment kustomize-controller
Name: kustomize-controller
Namespace: flux-system
CreationTimestamp: Tue, 23 Aug 2022 20:30:23 +0000
Labels: app.kubernetes.io/component=kustomize-controller
app.kubernetes.io/instance=flux
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=microsoft.flux
app.kubernetes.io/part-of=flux
app.kubernetes.io/version=v0.33.0
clusterconfig.azure.com/extension-version=1.6.0
control-plane=controller
Annotations: deployment.kubernetes.io/revision: 2
meta.helm.sh/release-name: flux
meta.helm.sh/release-namespace: flux-system
Selector: app=kustomize-controller
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=kustomize-controller
app.kubernetes.io/name=microsoft.flux
Annotations: prometheus.io/port: 8080
prometheus.io/scrape: true
Service Account: kustomize-controller
Containers:
manager:
Image: mcr.microsoft.com/oss/fluxcd/kustomize-controller:v0.27.1
Ports: 9440/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--events-addr=http://notification-controller.$(RUNTIME_NAMESPACE).svc.cluster.local./
--watch-all-namespaces=true
--log-level=info
--log-encoding=json
--enable-leader-election
--no-cross-namespace-refs=true
--no-remote-bases=true
--default-service-account=flux-applier
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
Liveness: http-get http://:healthz/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:healthz/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
RUNTIME_NAMESPACE: (v1:metadata.namespace)
Mounts:
/tmp from temp (rw)
Volumes:
temp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: kustomize-controller-7cfb84c5fd (1/1 replicas created)
Events: <none>
Hmm really strange, there are no events for the deployment. Could it be that the Kubernetes node has issues? Please delete the pod then after the new one starts, as soon as it fails, post here kubectl describe pod
.
Hi, I'm working with Sofia on this case. What happens with the kustomize-controller pod is that it actually gets OOMKilled. I tried to change its memory limit (from 1 GiB to 2 GiB), and still runs out of memory. I confirmed it happens because of memory ballooning by attaching to the pod during the short time that it runs, and noticing after about 20 seconds that the kustomize-controller process inside the container goes from 700 MiB to more than 3.2 GiB, and then the container gets killed with exit code 137.
Here's a snippet of describe pod kustomize-controller:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Mon, 12 Sep 2022 14:23:32 +0000
Finished: Mon, 12 Sep 2022 14:24:12 +0000
Ready: False
Restart Count: 731
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 64Mi
@azure-claudiu To fix the memory leak we need to reproduce it first. Can you please create a repo with the YAML files that make the controller behave like this?
Just to clarify, we created these Flux configs using an Azure extension. Steps are basically these here: https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/tutorial-use-gitops-flux2
Would you need just the yaml files, or also the flux configs created by this extension?
This is our file structure. There are three yaml files in our base/mongodb folder.
base/
mongodb/
kustomization.yaml
mongodb-deployment.yaml
mongodb-service.yaml
kustomization.yaml:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: default
resources:
- mongo-deployment.yaml
- mongo-service.yaml
mongo-deployment.yaml:
kind: Deployment
apiVersion: apps/v1
metadata:
name: mongodb
spec:
replicas: 1
selector:
matchLabels:
app: mongodb
template:
metadata:
labels:
app: mongodb
spec:
containers:
- name: mongodb-container
image: mongo:5.0
imagePullPolicy: Always
ports:
- containerPort: 27017
env:
- name: MONGO_INITDB_ROOT_USERNAME
value: admin
- name: MONGO_INITDB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mongo-creds
key: admin-password
mongo-service.yaml
kind: Service
apiVersion: v1
metadata:
name: database
spec:
type: ClusterIP
selector:
app: mongodb
ports:
- protocol: TCP
port: 27017
targetPort: 27017
Lastly, we created a GitOps configuration with the following command:
az k8s-configuration flux create -g YOUR_RESOURCE_GROUP \
-c YOUR_ARC_ENABLED_CLUSTER \
-n cluster-configs \
--namespace cluster-configs \
-t connectedClusters \
--scope cluster \
-u https://your.git.repository/repo \
--https-user=YOUR_USERNAME \
--https-key=YOUR_PASSWORD \
--branch YOUR_BRANCH_NAME \
--kustomization name=mongodb path=/base/mongodb prune=true
With those files I can’t reproduce the OOM. Can you please swap the image with the upstream one and see if it fails the same? If it does, then please do a heap dump and share with me. Set the controller image, in the its deployment to: ghcr.io/fluxcd/kustomize-controller:v0.28.0
.
I changed the image; it still leaks memory: https://fluxissues1.blob.core.windows.net/videos/top-102.mp4
Here's a heap dump every second (roughly), until it dies: https://fluxissues1.blob.core.windows.net/heaps/heap-b.zip
Can you please create a zip of your repo and share it with me, I also need the GitRepository and Flux all Kustomizations manifests from the cluster. You can reach out to me on CNCF Slack if the repo contains sensitive information and share it privately.
I suspect that one of repositories used here is very large and that may cause the OOM as we need to load the content in memory to verify the checksum. Can you please exec into source-controller and post the output of:
$ kubectl -n flux-system exec -it source-controller-7b66c9d497-r4d68 -- sh
~ $ du -sh /data/*
160.0K /data/helmchart
252.0K /data/helmrepository
88.0K /data/ocirepository
I managed to reproduce the OOM with a repo containing over 100MB of dummy files. I guess we need to reject such an artifact in source-controller and error out to tell people to use .sourceingore to exclude files that are not made for Flux.
We definitely have a directory in the source tree that contains large ML model files. Here's the output from our source-controller:
~ $ du -sh /data/*
681.6M /data/gitrepository
Ok then mystery solved, add a .sourceignore
file to that repo and exclude all paths which don't contain Kubernetes YAMLs. But still this is very problematic to download GB of unrelated data in-cluster, I suggest you create a dedicated repo for that app called <app-name>-deploy
with only YAMLs and make Flux look at that.
A better option would be to push the manifests from that repo to ACR and let Flux sync the manifest from there, see https://fluxcd.io/flux/cheatsheets/oci-artifacts/
Thanks for your help, Stefan. That resolved it for us. Closing the issue now!
We created a Flux configuration in our Kubernetes cluster. We keep getting an issue with the Kustomize-Controller pod getting stuck in a CrashLoopBackOff state. The logs aren't pointing to any particular root cause for the crash.
Pod status:
Pod events:
Pod logs: