Closed mcanaves closed 4 years ago
Hey @mcanaves is this a new behaviour? Have you tried previous version and it worked as expected?
I'm seeing the same issue when running locally (version 0.6.6.0):
{
"Timestamp":"2020-05-14T11:08:04.8075933+00:00",
"Level":"Information",
"MessageTemplate":"Handling event of type {type}. KamusSecret {name} in namespace {namespace}",
"Properties":{
"type":"Added",
"name":"kamus-secret",
"namespace":"kamus-test",
"SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"
}
{
"Timestamp":"2020-05-14T11:08:05.1402567+00:00",
"Level":"Error",
"MessageTemplate":"Error while handling KamusSecret event of type {eventType}, for KamusSecret {name} on namespace {namespace}",
"Exception":"Microsoft.Rest.HttpOperationException: Operation returned an invalid status code 'Conflict'\n at k8s.Kubernetes.CreateNamespacedSecretWithHttpMessagesAsync(V1Secret body, String namespaceParameter, String dryRun, String fieldManager, String pretty, Dictionary`2 customHeaders, CancellationToken cancellationToken)\n at k8s.KubernetesExtensions.CreateNamespacedSecretAsync(IKubernetes operations, V1Secret body, String namespaceParameter, String dryRun, String fieldManager, String pretty, CancellationToken cancellationToken)\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleAdd(KamusSecret kamusSecret, Boolean isUpdate) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 167\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleEvent(WatchEventType event, KamusSecret kamusSecret) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 84",
"Properties":{
"eventType":"Added",
"name":"kamus-secret",
"namespace":"kamus-test",
"SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"
}
{
"Timestamp":"2020-05-14T11:09:59.3440933+00:00",
"Level":"Information",
"MessageTemplate":"Watching KamusSecret events completed, terminating process",
"Properties":{
"SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"
}
Application is shutting down...
Versions used: Kamus (API images): 0.6.6.0 Kamus CLI: 0.3.0 Chart version: 0.4.7 KMS provider: AESKey Kubernetes flavour and version: 1.15.11 (Minikube)
Expected behavior: Kamus controller to be able to process KamusSecrets without restarting
I am just evaluating this so I've not tried other versions.
I see it returns conflict - I guess this means there is an existing secrets, can you please ensure there no secret with the same name?
This is in a clean namespace, there are no other secrets in it.
Hi @dmbeck-e2x , whats the interval of those restarts? does it create your secret?
Here is a full log from the test env which was recreated completely, Kamus was undeployed then re-deployed using helm:
{"Timestamp":"2020-05-14T12:15:55.7961882+00:00","Level":"Information","MessageTemplate":"Starting watch for KamusSecret V1Alpha2 events","Properties":{"SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
Hosting environment: Production
Content root path: /home/dotnet/app
Now listening on: https://0.0.0.0:8888
Now listening on: http://0.0.0.0:9999
Application started. Press Ctrl+C to shut down.
{"Timestamp":"2020-05-14T12:16:15.5979150+00:00","Level":"Information","MessageTemplate":"Handling event of type {type}. KamusSecret {name} in namespace {namespace}","Properties":{"type":"Added","name":"kamus-secret","namespace":"kamus-test","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
{"Timestamp":"2020-05-14T12:16:16.0035922+00:00","Level":"Information","MessageTemplate":"Received conversion request","Properties":{"SourceContext":"CustomResourceDescriptorController.Controllers.ConversionWebhookController","ActionId":"02985100-b048-4838-93a4-06168c5ad82a","ActionName":"CustomResourceDescriptorController.Controllers.ConversionWebhookController.Convert (crd-controller)"}}
{"Timestamp":"2020-05-14T12:16:16.0091311+00:00","Level":"Information","MessageTemplate":"Starting to convert from {apiVersion} to {desirediVersion}","Properties":{"apiVersion":"soluto.com/v1alpha2","desirediVersion":"soluto.com/v1alpha1","SourceContext":"CustomResourceDescriptorController.Controllers.ConversionWebhookController","ActionId":"02985100-b048-4838-93a4-06168c5ad82a","ActionName":"CustomResourceDescriptorController.Controllers.ConversionWebhookController.Convert (crd-controller)"}}
{"Timestamp":"2020-05-14T12:16:16.1300403+00:00","Level":"Information","MessageTemplate":"Created a secret from KamusSecret {name} in namespace {namespace} successfully.","Properties":{"name":"kamus-secret","namespace":"kamus-test","log_type":"audit","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
{"Timestamp":"2020-05-14T12:17:53.7819391+00:00","Level":"Information","MessageTemplate":"Watching KamusSecret events completed, terminating process","Properties":{"SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
Application is shutting down...
The secret is created as per the log, but then the controller just shuts down and enters a CrashLoopBackOff:
NAME READY STATUS RESTARTS AGE
kamus-controller-586d6c7d76-xk5qc 0/1 CrashLoopBackOff 1 4m3s
kamus-decryptor-6d674cc566-mv2b8 1/1 Running 0 4m3s
kamus-encryptor-67956fd76b-lt7br 1/1 Running 0 4m3s
Can you share the kubectl describe pod
of the controller?
This is for the same cluster as above:
Name: kamus-controller-586d6c7d76-xk5qc
Namespace: kamus
Priority: 0
Node: minikube/192.168.39.118
Start Time: Thu, 14 May 2020 20:15:50 +0800
Labels: app=kamus
component=controller
pod-template-hash=586d6c7d76
release=kamus
Annotations: checksum/config: 675952119fb5c3a92f62c296d69db5eee6b09645ea41b546f5c101d50f7d0eb6
checksum/secret: ee4c8b9698d6c782497c64eaa785505899f2451f097b3d55783030d5f6825524
checksum/tls-secret: e968e9c87afe7362dc31460ce2a2389c95e961a9a3dcd8c75f3510858a975c0c
Status: Running
IP: 172.17.0.5
Controlled By: ReplicaSet/kamus-controller-586d6c7d76
Containers:
controller:
Container ID: docker://670d7ffc198b18beec08ed4d0d1df14c36b395b2966c75fa0e94de1ee6cdeb89
Image: soluto/kamus:controller-0.6.6.0
Image ID: docker-pullable://soluto/kamus@sha256:66e4ac20cc96e4ebe913509d87d325a14805dfa8d571ac4307aaa8a8b80aec1a
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 14 May 2020 20:24:31 +0800
Finished: Thu, 14 May 2020 20:25:49 +0800
Ready: False
Restart Count: 4
Limits:
cpu: 500m
memory: 600Mi
Requests:
cpu: 100m
memory: 128Mi
Liveness: http-get http://:9999/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:9999/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
kamus-controller ConfigMap Optional: false
Environment: <none>
Mounts:
/home/dotnet/app/secrets from secret-volume (rw)
/home/dotnet/app/tls from tls-secret-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kamus-controller-token-6nqb9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
secret-volume:
Type: Secret (a volume populated by a Secret)
SecretName: kamus
Optional: false
tls-secret-volume:
Type: Secret (a volume populated by a Secret)
SecretName: kamus-controller
Optional: false
kamus-controller-token-6nqb9:
Type: Secret (a volume populated by a Secret)
SecretName: kamus-controller-token-6nqb9
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned kamus/kamus-controller-586d6c7d76-xk5qc to minikube
Warning Unhealthy 8m13s kubelet, minikube Readiness probe failed: Get http://172.17.0.5:9999/healthz: dial tcp 172.17.0.5:9999: connect: connection refused
Normal Pulled 96s (x5 over 10m) kubelet, minikube Container image "soluto/kamus:controller-0.6.6.0" already present on machine
Normal Created 96s (x5 over 10m) kubelet, minikube Created container controller
Normal Started 96s (x5 over 10m) kubelet, minikube Started container controller
Warning Unhealthy 92s kubelet, minikube Readiness probe failed: Get http://172.17.0.5:9999/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning BackOff 8s (x11 over 6m24s) kubelet, minikube Back-off restarting failed container
I'll try to reproduce that on my side and report back.
Maybe I can help as well, I am in the exact same situation. I upgraded the controller to 0.6.6.0 and saw that it kept restarting. Checked the healthcheck with the netshoot image:
curl http://pod-ip:9999/healthz
Healthy
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 17 May 2020 11:00:25 +0300
Finished: Sun, 17 May 2020 11:01:51 +0300
Went over and made sure all the KamusSecret resources are created properly and deleted all the K8S Secrets resulted from them. The controller recreated them at the next restart (still in the crash loop). So it seem that the controller works properly but:
k8s.KubernetesExtensions.CreateNamespacedSecretAsync
no matter the situation.I looked through the code a bit and I think that happens here: https://github.com/Soluto/kamus/blob/master/src/crd-controller/HostedServices/V1Alpha2Controller.cs#L64
I will disable the alarms and leave the controller on crash loop for now :)
Hi, @MihaiAnei thanks for the info. There are 2 issues here as you pointed out.
The first one, that we log conflict each time we come up, it's just a cosmetic issue and nothing that hurt functionality.
The second one is probably a regression introduced by the Kubernetes client we've updated, and it stopped honoring the long timeout we provide to the HttpClient we use. It shuts off the stream too fast, and then the observable receives a completed event as you pointed out, and terminate the container.
I'll try to work on a solution for both issues soon.
Please try chart 0.4.8 with Kamus version 0.6.7.0 that supposed to fix that issue.
I can confirm the fix works, 25 minutes and no restarts using the same setup as before.
Great, you can expect restart once an hour since this is the timeout we provide to the watcher.
Hi guys, I still have a lot of restarts when using the latest version. Over 41 restarts for the controller.
{"Timestamp":"2020-05-25T11:50:16.4553497+00:00","Level":"Error","MessageTemplate":"Unexpected error occured while watching KamusSecret events","Exception":"System.IO.IOException: The response ended prematurely.\n at System.Net.Http.HttpConnection.FillAsync()\n at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.ReadAsyncCore(Memory`1 buffer, CancellationToken cancellationToken)\n at System.IO.StreamReader.ReadBufferAsync(CancellationToken cancellationToken)\n at System.IO.StreamReader.ReadLineAsyncInternal()\n at k8s.Watcher`1.WatcherLoop(CancellationToken cancellationToken)","Properties":{"SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
25/05/2020 14:50:16 Application is shutting down...
As I said in my previous comment, it is expected to see restarts once an hour in the current configuration.
We might consider to change it so it will never restart.
Do you see any operational issues expect those restarts?
I've seen this operational issue cause problems- unsure if they are related.
These are logs from fluxcd flux and fluxcd helm-operator
{
"caller": "loop.go:108",
"component": "sync-loop",
"err": "collating resources in cluster for sync: conversion webhook for soluto.com/v1alpha2, Kind=KamusSecret failed: Post https://kamus-controller.kamus.svc:443/api/v1/conversion-webhook?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)",
"ts": "2020-11-04T00:04:18.249326875Z"
}
I've seen kamus-controller cause problems when a new HelmRelease is pushed out.
As far as I can see this is the failure mode (using kamus-init-containers)
This other failure mode (using KamusSecrets)
This is a mixture of 3 problems (flux, kamus, helm), so I don't fault any one of them. But understanding the best way to use all three while avoiding this problem is escaping my knowledge.
I have the exactly same problem, where kamus controller is restarted each time helm-operator checks if a new release needs to be done. I think this is a problem with the helm chart itself, all the variables that are generated at helm compile time need to be allowed to be set to a certain value in order to make sure that automatic release systems don't release kamus every time they check if a new release needs to be made.
Here is the output of helm diff plugin between two revisions:
❯ helm diff revision kamus 2501 2502
kamus-system, kamus-controller, Deployment (apps) has changed:
# Source: kamus/templates/deployment-controller.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kamus-controller
labels:
app: kamus
component: controller
chart: kamus-0.4.16
release: kamus
heritage: Helm
spec:
strategy:
rollingUpdate:
maxUnavailable: 0
replicas: 1
selector:
matchLabels:
app: kamus
release: kamus
component: controller
template:
metadata:
annotations:
checksum/config: 83d94df4d352c9b59a773f3f684ebc34eea446cfb64e2f971ce760c36223955a
checksum/secret: 073f1a7b0b17886b39516fee234f03c02d4bf9aec8b2dab802c204fab13686f9
- checksum/tls-secret: d341aebe1536e5e443101b952b433539e732dbe5bca51fa45aaa09e975ea3c31
+ checksum/tls-secret: a00981a4652b98ab2e8aa7f31e21d862bb387573275f067c3e36342882fc929d
labels:
app: kamus
release: kamus
component: controller
spec:
serviceAccountName: kamus-controller
automountServiceAccountToken: true
containers:
- name: controller
image: soluto/kamus:controller-0.8.0.0
imagePullPolicy: IfNotPresent
volumeMounts:
- name: secret-volume
mountPath: /home/dotnet/app/secrets
- name: tls-secret-volume
mountPath: /home/dotnet/app/tls
livenessProbe:
httpGet:
path: /healthz
port: 9999
readinessProbe:
httpGet:
path: /healthz
port: 9999
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
envFrom:
- configMapRef:
name: kamus-controller
volumes:
- name: secret-volume
secret:
secretName: kamus
- name: tls-secret-volume
secret:
secretName: kamus-controller
kamus-system, kamus-controller, Secret (v1) has changed:
# Source: kamus/templates/kamussecret-crd.yaml
apiVersion: v1
kind: Secret
metadata:
name: kamus-controller
data:
- certificate.crt: '-------- # (1184 bytes)'
- privateKey.key: '-------- # (1679 bytes)'
+ certificate.crt: '++++++++ # (1180 bytes)'
+ privateKey.key: '++++++++ # (1679 bytes)'
type: Opaque
kamus-system, kamussecrets.soluto.com, CustomResourceDefinition (apiextensions.k8s.io) has changed:
# Source: kamus/templates/kamussecret-crd.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: kamussecrets.soluto.com
spec:
preserveUnknownFields: false
group: soluto.com
versions:
- name: v1alpha1
served: true
storage: false
schema:
openAPIV3Schema:
type: object
properties:
data:
type: object
additionalProperties: true
serviceAccount:
type: string
type:
type: string
- name: v1alpha2
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
data:
type: object
additionalProperties: true
stringData:
type: object
additionalProperties: true
serviceAccount:
type: string
type:
type: string
scope: Namespaced
names:
plural: kamussecrets
singular: kamussecret
kind: KamusSecret
shortNames:
- ks
conversion:
strategy: Webhook
webhookClientConfig:
service:
namespace: kamus-system
name: kamus-controller
path: /api/v1/conversion-webhook
- caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lRRkhyVTRqMXRGNklEUGN0U2JKNzlFREFOQmdrcWhraUc5dzBCQVFzRkFEQVQKTVJFd0R3WURWUVFERXdocllXMTFjeTFqWVRBZUZ3MHlNREV4TVRFeE5UUXpNelJhRncwek1ERXhNRGt4TlRRegpNelJhTUJNeEVUQVBCZ05WQkFNVENHdGhiWFZ6TFdOaE1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0NBUThBCk1JSUJDZ0tDQVFFQXUzRkpVL0RjeHM0U2t1bERZbUp5NDd2ejFENVBqTFFkWWJBUWw1MjFjNWgxNUYvaWlNbWMKQjJERXlicjlIUk1mMklaVjJBSDlaSVhXdEc2L2U1d1pTcWd3WHJOaHFMZHhNTWRWQ0J3bWNSUWFZVUZDVUszLwpva0o0YmNyQ3hGb2x4R0NpZTMvZFBLbFY5VFNQSTlvai9Yc2E5WjhrN25IeEMwWE5FZ2FpdVJndkc4V2wyL0hICnFWR0hSdnN5cG1sL3pOT1RGN1V4cm1HTUh3dGxFMnN6SmU3ZzBteDUwd3J5eHFmVVZEOWtXRExPS3NtbFBKZTcKM3RGMEs3WXF3UWtTSEZERnFpWGkvRmpHRUM3UWNMU25BRUhRUEE5Uk4rajdjbGdaU0R5YmVqb0pYUWZaa1BQVAphbDZzN09BMjdEYW1FaS9OcFlsS1hrcjNhTkQzd2ZVTUV3SURBUUFCbzBJd1FEQU9CZ05WSFE4QkFmOEVCQU1DCkFxUXdIUVlEVlIwbEJCWXdGQVlJS3dZQkJRVUhBd0VHQ0NzR0FRVUZCd01DTUE4R0ExVWRFd0VCL3dRRk1BTUIKQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFGM0hBQWl2UDNNQ20vUmRyUHhkWkwycDFJUkRBTFJQZ0pITwpacmtSZ1ZITjJ1V09Tc2d0elhRdkY1UlhiUUVXR29HMldyWldXOWxkcUxzN2srMnpPT2pqVkRXd3MwUEVPcWNICmtJb1RVRTFmaWFjaGYwOVFFdHF0VTV1dFUvdzhYTVRGY0g2L0hBeFRuRHA4ZVRPV0dwMkxXd1o3UVluMzBMRnYKM2xOYmlQc3I1dlhZK2g4RTMwUHhvSnRBZnZEUDZjSTdLaCtoRUxUekcxZ2JETm8yTnNGSmlKOHRIUUdSVmJHQgpYUG1PbDNyYXdvc1VNaHZ0MlFCOE5sUXIyclN5MjBMRUkvSW04UHU5Rmt1OGZDRUpSdDUwSlEzM1lZZW1KVGhnCm81SExGQndMTWNHZnVneEFmM2llMDl2aDh5aTVUVU9MS0tTWTdFWmhzZlJnSTMzMWVyTT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
+ caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lRTkx1azdOcWxqNTZ6TVVjdnIwVk1WakFOQmdrcWhraUc5dzBCQVFzRkFEQVQKTVJFd0R3WURWUVFERXdocllXMTFjeTFqWVRBZUZ3MHlNREV4TVRFeE5UUTJNelJhRncwek1ERXhNRGt4TlRRMgpNelJhTUJNeEVUQVBCZ05WQkFNVENHdGhiWFZ6TFdOaE1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0NBUThBCk1JSUJDZ0tDQVFFQXAxM2xLZWN5YitNYjBEc1ZxcVNrNzlrQ1JWd3BXT0NmQ05yTjRvWUp2anUyZnZjMkpTdzkKcDRQZmVDalNCWlVUbDFDQ2MwazNvQjBtYSs3SHRlWXQ0eEVtcWNPaW8wWjU0YmxnM2hTRjFJMnFRSFQyZGsyNwpZVE9INlpKUGx6YXhQM29lWDNXdS82ZU5aTXdPb1FqM3U2c2VwbWZmajdHMzhCTjJ0SnBXdXlySFYydDhsSFdYCnVhVlMzd3RGSERucllHdVNNa1BFaGw2MktpVmpxcllzTGRzN1k5bThRczdIODU5NXpodUFqY2E4ZDBxdENsL1QKbXROdDFmMndIUjYzNDM2b0lXT09iVjBTRWc1by9XN0NUbE5sREZxK2hTUHIybkRPZVo3KzBmdTNudnRSQzNjQwp0RHQyQ2VCUko0U0xKYi82alAvWmNRbkNBMlFhVkQ5UWJ3SURBUUFCbzBJd1FEQU9CZ05WSFE4QkFmOEVCQU1DCkFxUXdIUVlEVlIwbEJCWXdGQVlJS3dZQkJRVUhBd0VHQ0NzR0FRVUZCd01DTUE4R0ExVWRFd0VCL3dRRk1BTUIKQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFHV1QxOWUvUm1kOGhJMTJMVVFpMEJOMEluQlY1WE5BNUxHMQpQM3RVWHBCQmMzb3Z4c2laVG9PZXpIWjF3WktQakZENVhWL3VnTW5wMXFReDNrWm5uOG9Ybm5MYU1lTTdkNDBoCkZZRFMvQWZIWXk4QTQ2RWxSeThuNUZHY3VMTmo0VU8zdTRsdk9WckpLRFkxT3hGdGREREJjMUZ3UWlEWCsyeSsKUjg1UzFIQWVEOXpFYVg5akgyL3JPREM3clBjVnBoZnVOaUNONTJRY1lNd3daeGYydlBjcnI4cWVFdUJjRDJqSQpsNjNaVEtwNFhZTytkL01pK3FKeTBZYVpMRjRZV1lZRWdZdG1raVRoc2ROQy9SeEtoM2sxZ0ZTU3ZLZGJvcVgvClBnMENQbTh6UjZ5MkRiYjdnTVIwQ24vUDNIT1dEVVRiNWRXTDIwdUZVMG41UzZ6VXRGYz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
I think a new issue with the last two comments above needs to be created here: https://github.com/Soluto/helm-charts
@wimo7083 Can you please open a new issue for the flux operator related issue so we can be more focused?
@lebenitza Definitely, the auto-generated fields are a topic we can discuss in the helm-charts repo, feel free to open an issue over there.
Describe the bug After update to Kamus version 0.6.6.0 (chart version 0.4.7) controller is terminating the process after watching for KamusSecret so pod is always restarting and at some moment probes fails so pod enters on CrashLoopBackOff state.
Versions used Kamus (API images): 0.6.6.0 Kamus CLI: 0.3.0 Chart version: 0.4.7 KMS provider: AwsKms Kubernetes flavour and version: 1.14.9-aws.8
Expected behavior Kamus controller not restarting and be in a healthy state.