Soluto / kamus

An open source, git-ops, zero-trust secret encryption and decryption solution for Kubernetes applications
https://kamus.soluto.io
Apache License 2.0
930 stars 68 forks source link

Kamus controller always restarting #517

Closed mcanaves closed 4 years ago

mcanaves commented 4 years ago

Describe the bug After update to Kamus version 0.6.6.0 (chart version 0.4.7) controller is terminating the process after watching for KamusSecret so pod is always restarting and at some moment probes fails so pod enters on CrashLoopBackOff state.

{
   "Timestamp":"2020-05-11T10:22:44.7789456+00:00",
   "Level":"Information",
   "MessageTemplate":"Starting watch for KamusSecret V1Alpha2 events",
   "Properties":{
      "SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"
   }
}
Hosting environment: Production
Content root path: /home/dotnet/app
Now listening on: https://0.0.0.0:8888
Now listening on: http://0.0.0.0:9999
Application started. Press Ctrl+C to shut down.
{
   "Timestamp":"2020-05-11T10:24:33.8513234+00:00",
   "Level":"Information",
   "MessageTemplate":"Watching KamusSecret events completed, terminating process",
   "Properties":{
      "SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"
   }
}
Application is shutting down...

Versions used Kamus (API images): 0.6.6.0 Kamus CLI: 0.3.0 Chart version: 0.4.7 KMS provider: AwsKms Kubernetes flavour and version: 1.14.9-aws.8

Expected behavior Kamus controller not restarting and be in a healthy state.

omerlh commented 4 years ago

Hey @mcanaves is this a new behaviour? Have you tried previous version and it worked as expected?

ghost commented 4 years ago

I'm seeing the same issue when running locally (version 0.6.6.0):

{
   "Timestamp":"2020-05-14T11:08:04.8075933+00:00",
   "Level":"Information",
   "MessageTemplate":"Handling event of type {type}. KamusSecret {name} in namespace {namespace}",
   "Properties":{
      "type":"Added",
      "name":"kamus-secret",
      "namespace":"kamus-test",
      "SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"

}
{
   "Timestamp":"2020-05-14T11:08:05.1402567+00:00",
   "Level":"Error",
   "MessageTemplate":"Error while handling KamusSecret event of type {eventType}, for KamusSecret {name} on namespace {namespace}",
   "Exception":"Microsoft.Rest.HttpOperationException: Operation returned an invalid status code 'Conflict'\n   at k8s.Kubernetes.CreateNamespacedSecretWithHttpMessagesAsync(V1Secret body, String namespaceParameter, String dryRun, String fieldManager, String pretty, Dictionary`2 customHeaders, CancellationToken cancellationToken)\n   at k8s.KubernetesExtensions.CreateNamespacedSecretAsync(IKubernetes operations, V1Secret body, String namespaceParameter, String dryRun, String fieldManager, String pretty, CancellationToken cancellationToken)\n   at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleAdd(KamusSecret kamusSecret, Boolean isUpdate) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 167\n   at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleEvent(WatchEventType event, KamusSecret kamusSecret) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 84",
   "Properties":{
      "eventType":"Added",
      "name":"kamus-secret",
      "namespace":"kamus-test",
      "SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"

}
{
   "Timestamp":"2020-05-14T11:09:59.3440933+00:00",
   "Level":"Information",
   "MessageTemplate":"Watching KamusSecret events completed, terminating process",
   "Properties":{
      "SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"  
}
Application is shutting down...

Versions used: Kamus (API images): 0.6.6.0 Kamus CLI: 0.3.0 Chart version: 0.4.7 KMS provider: AESKey Kubernetes flavour and version: 1.15.11 (Minikube)

Expected behavior: Kamus controller to be able to process KamusSecrets without restarting

I am just evaluating this so I've not tried other versions.

omerlh commented 4 years ago

I see it returns conflict - I guess this means there is an existing secrets, can you please ensure there no secret with the same name?

ghost commented 4 years ago

This is in a clean namespace, there are no other secrets in it.

shaikatz commented 4 years ago

Hi @dmbeck-e2x , whats the interval of those restarts? does it create your secret?

ghost commented 4 years ago

Here is a full log from the test env which was recreated completely, Kamus was undeployed then re-deployed using helm:

{"Timestamp":"2020-05-14T12:15:55.7961882+00:00","Level":"Information","MessageTemplate":"Starting watch for KamusSecret V1Alpha2 events","Properties":{"SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
Hosting environment: Production
Content root path: /home/dotnet/app
Now listening on: https://0.0.0.0:8888
Now listening on: http://0.0.0.0:9999
Application started. Press Ctrl+C to shut down.
{"Timestamp":"2020-05-14T12:16:15.5979150+00:00","Level":"Information","MessageTemplate":"Handling event of type {type}. KamusSecret {name} in namespace {namespace}","Properties":{"type":"Added","name":"kamus-secret","namespace":"kamus-test","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
{"Timestamp":"2020-05-14T12:16:16.0035922+00:00","Level":"Information","MessageTemplate":"Received conversion request","Properties":{"SourceContext":"CustomResourceDescriptorController.Controllers.ConversionWebhookController","ActionId":"02985100-b048-4838-93a4-06168c5ad82a","ActionName":"CustomResourceDescriptorController.Controllers.ConversionWebhookController.Convert (crd-controller)"}}
{"Timestamp":"2020-05-14T12:16:16.0091311+00:00","Level":"Information","MessageTemplate":"Starting to convert from {apiVersion} to {desirediVersion}","Properties":{"apiVersion":"soluto.com/v1alpha2","desirediVersion":"soluto.com/v1alpha1","SourceContext":"CustomResourceDescriptorController.Controllers.ConversionWebhookController","ActionId":"02985100-b048-4838-93a4-06168c5ad82a","ActionName":"CustomResourceDescriptorController.Controllers.ConversionWebhookController.Convert (crd-controller)"}}
{"Timestamp":"2020-05-14T12:16:16.1300403+00:00","Level":"Information","MessageTemplate":"Created a secret from KamusSecret {name} in namespace {namespace} successfully.","Properties":{"name":"kamus-secret","namespace":"kamus-test","log_type":"audit","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
{"Timestamp":"2020-05-14T12:17:53.7819391+00:00","Level":"Information","MessageTemplate":"Watching KamusSecret events completed, terminating process","Properties":{"SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
Application is shutting down...

The secret is created as per the log, but then the controller just shuts down and enters a CrashLoopBackOff:

NAME                                READY   STATUS             RESTARTS   AGE
kamus-controller-586d6c7d76-xk5qc   0/1     CrashLoopBackOff   1          4m3s
kamus-decryptor-6d674cc566-mv2b8    1/1     Running            0          4m3s
kamus-encryptor-67956fd76b-lt7br    1/1     Running            0          4m3s
shaikatz commented 4 years ago

Can you share the kubectl describe pod of the controller?

ghost commented 4 years ago

This is for the same cluster as above:

Name:           kamus-controller-586d6c7d76-xk5qc
Namespace:      kamus
Priority:       0
Node:           minikube/192.168.39.118
Start Time:     Thu, 14 May 2020 20:15:50 +0800
Labels:         app=kamus
                component=controller
                pod-template-hash=586d6c7d76
                release=kamus
Annotations:    checksum/config: 675952119fb5c3a92f62c296d69db5eee6b09645ea41b546f5c101d50f7d0eb6
                checksum/secret: ee4c8b9698d6c782497c64eaa785505899f2451f097b3d55783030d5f6825524
                checksum/tls-secret: e968e9c87afe7362dc31460ce2a2389c95e961a9a3dcd8c75f3510858a975c0c
Status:         Running
IP:             172.17.0.5
Controlled By:  ReplicaSet/kamus-controller-586d6c7d76
Containers:
  controller:
    Container ID:   docker://670d7ffc198b18beec08ed4d0d1df14c36b395b2966c75fa0e94de1ee6cdeb89
    Image:          soluto/kamus:controller-0.6.6.0
    Image ID:       docker-pullable://soluto/kamus@sha256:66e4ac20cc96e4ebe913509d87d325a14805dfa8d571ac4307aaa8a8b80aec1a
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 14 May 2020 20:24:31 +0800
      Finished:     Thu, 14 May 2020 20:25:49 +0800
    Ready:          False
    Restart Count:  4
    Limits:
      cpu:     500m
      memory:  600Mi
    Requests:
      cpu:      100m
      memory:   128Mi
    Liveness:   http-get http://:9999/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:9999/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      kamus-controller  ConfigMap  Optional: false
    Environment:        <none>
    Mounts:
      /home/dotnet/app/secrets from secret-volume (rw)
      /home/dotnet/app/tls from tls-secret-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kamus-controller-token-6nqb9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  secret-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kamus
    Optional:    false
  tls-secret-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kamus-controller
    Optional:    false
  kamus-controller-token-6nqb9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kamus-controller-token-6nqb9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  10m                  default-scheduler  Successfully assigned kamus/kamus-controller-586d6c7d76-xk5qc to minikube
  Warning  Unhealthy  8m13s                kubelet, minikube  Readiness probe failed: Get http://172.17.0.5:9999/healthz: dial tcp 172.17.0.5:9999: connect: connection refused
  Normal   Pulled     96s (x5 over 10m)    kubelet, minikube  Container image "soluto/kamus:controller-0.6.6.0" already present on machine
  Normal   Created    96s (x5 over 10m)    kubelet, minikube  Created container controller
  Normal   Started    96s (x5 over 10m)    kubelet, minikube  Started container controller
  Warning  Unhealthy  92s                  kubelet, minikube  Readiness probe failed: Get http://172.17.0.5:9999/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    8s (x11 over 6m24s)  kubelet, minikube  Back-off restarting failed container
shaikatz commented 4 years ago

I'll try to reproduce that on my side and report back.

lebenitza commented 4 years ago

Maybe I can help as well, I am in the exact same situation. I upgraded the controller to 0.6.6.0 and saw that it kept restarting. Checked the healthcheck with the netshoot image:

curl http://pod-ip:9999/healthz
Healthy
Last State:     Terminated
  Reason:       Completed
  Exit Code:    0
   Started:      Sun, 17 May 2020 11:00:25 +0300
   Finished:     Sun, 17 May 2020 11:01:51 +0300

Went over and made sure all the KamusSecret resources are created properly and deleted all the K8S Secrets resulted from them. The controller recreated them at the next restart (still in the crash loop). So it seem that the controller works properly but:

I looked through the code a bit and I think that happens here: https://github.com/Soluto/kamus/blob/master/src/crd-controller/HostedServices/V1Alpha2Controller.cs#L64

I will disable the alarms and leave the controller on crash loop for now :)

Click to expand {"Timestamp":"2020-05-17T07:58:19.7092896+00:00","Level":"Information","MessageTemplate":"Handling event of type {type}. KamusSecret {name} in namespace {namespace}","Prope rties":{"type":"Added","name":"parse-backend-fastorder-production","namespace":"fastorder","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Contro ller"}} {"Timestamp":"2020-05-17T07:58:20.4086584+00:00","Level":"Information","MessageTemplate":"Handling event of type {type}. KamusSecret {name} in namespace {namespace}","Prope rties":{"type":"Added","name":"parse-backend-staging","namespace":"fastorder-staging","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller" }} {"Timestamp":"2020-05-17T07:58:20.4140394+00:00","Level":"Information","MessageTemplate":"Handling event of type {type}. KamusSecret {name} in namespace {namespace}","Prope rties":{"type":"Added","name":"parse-backend-cbc","namespace":"cbc-bistro","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}} {"Timestamp":"2020-05-17T07:58:20.5083298+00:00","Level":"Information","MessageTemplate":"Handling event of type {type}. KamusSecret {name} in namespace {namespace}","Prope rties":{"type":"Added","name":"parse-backend-takeat-production","namespace":"takeat","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"} } {"Timestamp":"2020-05-17T07:58:22.7126303+00:00","Level":"Error","MessageTemplate":"Unhandled exception while processing request: System.Threading.Tasks.TaskCanceledExcepti on: A task was canceled.\n at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func`2 predicate, CancellationToken cancellationTok en)\n at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)\n at Microsoft.AspNetCore.Builder.Extensions.MapWhenMi ddleware.Invoke(HttpContext context)\n at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)\n at CustomResourceDescriptorControll er.LoggingMiddleware.Invoke(HttpContext httpContext) in /app/crd-controller/LoggingMiddleware.cs:line 26","Properties":{"SourceContext":"CustomResourceDescriptorController. LoggingMiddleware"}} {"Timestamp":"2020-05-17T07:58:24.1073766+00:00","Level":"Error","MessageTemplate":"Error while handling KamusSecret event of type {eventType}, for KamusSecret {name} on na mespace {namespace}","Exception":"Microsoft.Rest.HttpOperationException: Operation returned an invalid status code 'Conflict'\n at k8s.Kubernetes.CreateNamespacedSecretWi thHttpMessagesAsync(V1Secret body, String namespaceParameter, String dryRun, String fieldManager, String pretty, Dictionary`2 customHeaders, CancellationToken cancellationT oken)\n at k8s.KubernetesExtensions.CreateNamespacedSecretAsync(IKubernetes operations, V1Secret body, String namespaceParameter, String dryRun, String fieldManager, Stri ng pretty, CancellationToken cancellationToken)\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleAdd(KamusSecret kamusSecret, Boolean isUpd ate) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 167\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleEvent(WatchEvent Type event, KamusSecret kamusSecret) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 84","Properties":{"eventType":"Added","name":"parse-backend-staging"," namespace":"fastorder-staging","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}} {"Timestamp":"2020-05-17T07:58:25.6071372+00:00","Level":"Error","MessageTemplate":"Error while handling KamusSecret event of type {eventType}, for KamusSecret {name} on na mespace {namespace}","Exception":"Microsoft.Rest.HttpOperationException: Operation returned an invalid status code 'Conflict'\n at k8s.Kubernetes.CreateNamespacedSecretWi thHttpMessagesAsync(V1Secret body, String namespaceParameter, String dryRun, String fieldManager, String pretty, Dictionary`2 customHeaders, CancellationToken cancellationT oken)\n at k8s.KubernetesExtensions.CreateNamespacedSecretAsync(IKubernetes operations, V1Secret body, String namespaceParameter, String dryRun, String fieldManager, Stri ng pretty, CancellationToken cancellationToken)\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleAdd(KamusSecret kamusSecret, Boolean isUpd ate) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 167\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleEvent(WatchEvent Type event, KamusSecret kamusSecret) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 84","Properties":{"eventType":"Added","name":"parse-backend-cbc","name space":"cbc-bistro","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}} {"Timestamp":"2020-05-17T07:58:25.8050794+00:00","Level":"Error","MessageTemplate":"Unhandled exception while processing request: System.Threading.Tasks.TaskCanceledExcepti on: A task was canceled.\n at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func`2 predicate, CancellationToken cancellationTok en)\n at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)\n at Microsoft.AspNetCore.Builder.Extensions.MapWhenMi ddleware.Invoke(HttpContext context)\n at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)\n at CustomResourceDescriptorControll er.LoggingMiddleware.Invoke(HttpContext httpContext) in /app/crd-controller/LoggingMiddleware.cs:line 26","Properties":{"SourceContext":"CustomResourceDescriptorController. LoggingMiddleware"}} {"Timestamp":"2020-05-17T07:58:25.7122824+00:00","Level":"Error","MessageTemplate":"Error while handling KamusSecret event of type {eventType}, for KamusSecret {name} on na mespace {namespace}","Exception":"Microsoft.Rest.HttpOperationException: Operation returned an invalid status code 'Conflict'\n at k8s.Kubernetes.CreateNamespacedSecretWi thHttpMessagesAsync(V1Secret body, String namespaceParameter, String dryRun, String fieldManager, String pretty, Dictionary`2 customHeaders, CancellationToken cancellationT oken)\n at k8s.KubernetesExtensions.CreateNamespacedSecretAsync(IKubernetes operations, V1Secret body, String namespaceParameter, String dryRun, String fieldManager, Stri ng pretty, CancellationToken cancellationToken)\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleAdd(KamusSecret kamusSecret, Boolean isUpd ate) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 167\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleEvent(WatchEvent Type event, KamusSecret kamusSecret) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 84","Properties":{"eventType":"Added","name":"parse-backend-takeat-pro duction","namespace":"takeat","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}} {"Timestamp":"2020-05-17T07:58:25.8042767+00:00","Level":"Error","MessageTemplate":"Error while handling KamusSecret event of type {eventType}, for KamusSecret {name} on na mespace {namespace}","Exception":"Microsoft.Rest.HttpOperationException: Operation returned an invalid status code 'Conflict'\n at k8s.Kubernetes.CreateNamespacedSecretWi thHttpMessagesAsync(V1Secret body, String namespaceParameter, String dryRun, String fieldManager, String pretty, Dictionary`2 customHeaders, CancellationToken cancellationT oken)\n at k8s.KubernetesExtensions.CreateNamespacedSecretAsync(IKubernetes operations, V1Secret body, String namespaceParameter, String dryRun, String fieldManager, Stri ng pretty, CancellationToken cancellationToken)\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleAdd(KamusSecret kamusSecret, Boolean isUpd ate) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 167\n at CustomResourceDescriptorController.HostedServices.V1Alpha2Controller.HandleEvent(WatchEvent Type event, KamusSecret kamusSecret) in /app/crd-controller/HostedServices/V1Alpha2Controller.cs:line 84","Properties":{"eventType":"Added","name":"parse-backend-fastorder- production","namespace":"fastorder","SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}} {"Timestamp":"2020-05-17T08:00:06.3025326+00:00","Level":"Information","MessageTemplate":"Watching KamusSecret events completed, terminating process","Properties":{"SourceC ontext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}} Application is shutting down... stream closed
shaikatz commented 4 years ago

Hi, @MihaiAnei thanks for the info. There are 2 issues here as you pointed out.

The first one, that we log conflict each time we come up, it's just a cosmetic issue and nothing that hurt functionality.

The second one is probably a regression introduced by the Kubernetes client we've updated, and it stopped honoring the long timeout we provide to the HttpClient we use. It shuts off the stream too fast, and then the observable receives a completed event as you pointed out, and terminate the container.

I'll try to work on a solution for both issues soon.

shaikatz commented 4 years ago

Please try chart 0.4.8 with Kamus version 0.6.7.0 that supposed to fix that issue.

ghost commented 4 years ago

I can confirm the fix works, 25 minutes and no restarts using the same setup as before.

shaikatz commented 4 years ago

Great, you can expect restart once an hour since this is the timeout we provide to the watcher.

gameiro83 commented 4 years ago

Hi guys, I still have a lot of restarts when using the latest version. Over 41 restarts for the controller.

{"Timestamp":"2020-05-25T11:50:16.4553497+00:00","Level":"Error","MessageTemplate":"Unexpected error occured while watching KamusSecret events","Exception":"System.IO.IOException: The response ended prematurely.\n   at System.Net.Http.HttpConnection.FillAsync()\n   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.ReadAsyncCore(Memory`1 buffer, CancellationToken cancellationToken)\n   at System.IO.StreamReader.ReadBufferAsync(CancellationToken cancellationToken)\n   at System.IO.StreamReader.ReadLineAsyncInternal()\n   at k8s.Watcher`1.WatcherLoop(CancellationToken cancellationToken)","Properties":{"SourceContext":"CustomResourceDescriptorController.HostedServices.V1Alpha2Controller"}}
25/05/2020 14:50:16 Application is shutting down...
shaikatz commented 4 years ago

As I said in my previous comment, it is expected to see restarts once an hour in the current configuration.

We might consider to change it so it will never restart.

Do you see any operational issues expect those restarts?

apex-omontgomery commented 4 years ago

I've seen this operational issue cause problems- unsure if they are related.

These are logs from fluxcd flux and fluxcd helm-operator

{
  "caller": "loop.go:108",
  "component": "sync-loop",
  "err": "collating resources in cluster for sync: conversion webhook for soluto.com/v1alpha2, Kind=KamusSecret failed: Post https://kamus-controller.kamus.svc:443/api/v1/conversion-webhook?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)",
  "ts": "2020-11-04T00:04:18.249326875Z"
}

I've seen kamus-controller cause problems when a new HelmRelease is pushed out.

As far as I can see this is the failure mode (using kamus-init-containers)

  1. New/ modified HR- flux notices and updates HR
  2. Helm-operator notices- and tries to updates using the kamus-init-container
  3. Endpoint fails which gives the log above, this cause the CM or whatever you are creating with kamus-init-container to fail
  4. Since Kamus couldn't secret- the helmrelease fails

This other failure mode (using KamusSecrets)

  1. New/ modified HR- flux notices and updates HR and the KamusSecret
  2. kamus-controller restarts which causes a 1-3 minute delay on performing conversion.
  3. Something in the HR is dependent upon the corresponding output "secret" object- this delay causes an ordered dependency update failure.
  4. The dependent resource isn't smart enough/ aware enough to retry and the helm hooks aren't configured properly to handle this.

This is a mixture of 3 problems (flux, kamus, helm), so I don't fault any one of them. But understanding the best way to use all three while avoiding this problem is escaping my knowledge.

lebenitza commented 4 years ago

I have the exactly same problem, where kamus controller is restarted each time helm-operator checks if a new release needs to be done. I think this is a problem with the helm chart itself, all the variables that are generated at helm compile time need to be allowed to be set to a certain value in order to make sure that automatic release systems don't release kamus every time they check if a new release needs to be made.

Here is the output of helm diff plugin between two revisions:

❯ helm diff revision kamus 2501 2502
kamus-system, kamus-controller, Deployment (apps) has changed:
  # Source: kamus/templates/deployment-controller.yaml
  apiVersion: apps/v1
  kind: Deployment
  metadata:
    name: kamus-controller
    labels:
      app: kamus
      component: controller
      chart: kamus-0.4.16
      release: kamus
      heritage: Helm
  spec:
    strategy:
      rollingUpdate:
        maxUnavailable: 0
    replicas: 1
    selector:
        matchLabels:
          app: kamus
          release: kamus
          component: controller
    template:
      metadata:
        annotations:
          checksum/config: 83d94df4d352c9b59a773f3f684ebc34eea446cfb64e2f971ce760c36223955a
          checksum/secret: 073f1a7b0b17886b39516fee234f03c02d4bf9aec8b2dab802c204fab13686f9
-         checksum/tls-secret: d341aebe1536e5e443101b952b433539e732dbe5bca51fa45aaa09e975ea3c31
+         checksum/tls-secret: a00981a4652b98ab2e8aa7f31e21d862bb387573275f067c3e36342882fc929d
        labels:
          app: kamus
          release: kamus
          component: controller
      spec:
        serviceAccountName: kamus-controller
        automountServiceAccountToken: true
        containers:
          - name: controller
            image: soluto/kamus:controller-0.8.0.0
            imagePullPolicy: IfNotPresent
            volumeMounts:
              - name: secret-volume
                mountPath: /home/dotnet/app/secrets
              - name: tls-secret-volume
                mountPath: /home/dotnet/app/tls
            livenessProbe:
              httpGet:
                path: /healthz
                port: 9999
            readinessProbe:
              httpGet:
                path: /healthz
                port: 9999
            resources:
              limits:
                cpu: 100m
                memory: 128Mi
              requests:
                cpu: 100m
                memory: 128Mi
            envFrom:
             - configMapRef:
                name: kamus-controller
        volumes:
          - name: secret-volume
            secret:
              secretName: kamus
          - name: tls-secret-volume
            secret:
              secretName: kamus-controller
kamus-system, kamus-controller, Secret (v1) has changed:
  # Source: kamus/templates/kamussecret-crd.yaml
  apiVersion: v1
  kind: Secret
  metadata:
    name: kamus-controller
  data:
-   certificate.crt: '-------- # (1184 bytes)'
-   privateKey.key: '-------- # (1679 bytes)'
+   certificate.crt: '++++++++ # (1180 bytes)'
+   privateKey.key: '++++++++ # (1679 bytes)'
  type: Opaque

kamus-system, kamussecrets.soluto.com, CustomResourceDefinition (apiextensions.k8s.io) has changed:
  # Source: kamus/templates/kamussecret-crd.yaml
  apiVersion: apiextensions.k8s.io/v1beta1
  kind: CustomResourceDefinition
  metadata:
    name: kamussecrets.soluto.com
  spec:
    preserveUnknownFields: false
    group: soluto.com
    versions:
    - name: v1alpha1
      served: true
      storage: false
      schema:
        openAPIV3Schema:
          type: object
          properties:
            data:
              type: object
              additionalProperties: true
            serviceAccount: 
              type: string
            type:
              type: string
    - name: v1alpha2
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            data:
              type: object
              additionalProperties: true
            stringData:
              type: object
              additionalProperties: true
            serviceAccount: 
              type: string
            type:
              type: string
    scope: Namespaced
    names:
      plural: kamussecrets
      singular: kamussecret
      kind: KamusSecret
      shortNames:
       - ks
    conversion:
      strategy: Webhook
      webhookClientConfig:
        service:
          namespace: kamus-system
          name: kamus-controller
          path: /api/v1/conversion-webhook
-       caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lRRkhyVTRqMXRGNklEUGN0U2JKNzlFREFOQmdrcWhraUc5dzBCQVFzRkFEQVQKTVJFd0R3WURWUVFERXdocllXMTFjeTFqWVRBZUZ3MHlNREV4TVRFeE5UUXpNelJhRncwek1ERXhNRGt4TlRRegpNelJhTUJNeEVUQVBCZ05WQkFNVENHdGhiWFZ6TFdOaE1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0NBUThBCk1JSUJDZ0tDQVFFQXUzRkpVL0RjeHM0U2t1bERZbUp5NDd2ejFENVBqTFFkWWJBUWw1MjFjNWgxNUYvaWlNbWMKQjJERXlicjlIUk1mMklaVjJBSDlaSVhXdEc2L2U1d1pTcWd3WHJOaHFMZHhNTWRWQ0J3bWNSUWFZVUZDVUszLwpva0o0YmNyQ3hGb2x4R0NpZTMvZFBLbFY5VFNQSTlvai9Yc2E5WjhrN25IeEMwWE5FZ2FpdVJndkc4V2wyL0hICnFWR0hSdnN5cG1sL3pOT1RGN1V4cm1HTUh3dGxFMnN6SmU3ZzBteDUwd3J5eHFmVVZEOWtXRExPS3NtbFBKZTcKM3RGMEs3WXF3UWtTSEZERnFpWGkvRmpHRUM3UWNMU25BRUhRUEE5Uk4rajdjbGdaU0R5YmVqb0pYUWZaa1BQVAphbDZzN09BMjdEYW1FaS9OcFlsS1hrcjNhTkQzd2ZVTUV3SURBUUFCbzBJd1FEQU9CZ05WSFE4QkFmOEVCQU1DCkFxUXdIUVlEVlIwbEJCWXdGQVlJS3dZQkJRVUhBd0VHQ0NzR0FRVUZCd01DTUE4R0ExVWRFd0VCL3dRRk1BTUIKQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFGM0hBQWl2UDNNQ20vUmRyUHhkWkwycDFJUkRBTFJQZ0pITwpacmtSZ1ZITjJ1V09Tc2d0elhRdkY1UlhiUUVXR29HMldyWldXOWxkcUxzN2srMnpPT2pqVkRXd3MwUEVPcWNICmtJb1RVRTFmaWFjaGYwOVFFdHF0VTV1dFUvdzhYTVRGY0g2L0hBeFRuRHA4ZVRPV0dwMkxXd1o3UVluMzBMRnYKM2xOYmlQc3I1dlhZK2g4RTMwUHhvSnRBZnZEUDZjSTdLaCtoRUxUekcxZ2JETm8yTnNGSmlKOHRIUUdSVmJHQgpYUG1PbDNyYXdvc1VNaHZ0MlFCOE5sUXIyclN5MjBMRUkvSW04UHU5Rmt1OGZDRUpSdDUwSlEzM1lZZW1KVGhnCm81SExGQndMTWNHZnVneEFmM2llMDl2aDh5aTVUVU9MS0tTWTdFWmhzZlJnSTMzMWVyTT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
+       caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM4akNDQWRxZ0F3SUJBZ0lRTkx1azdOcWxqNTZ6TVVjdnIwVk1WakFOQmdrcWhraUc5dzBCQVFzRkFEQVQKTVJFd0R3WURWUVFERXdocllXMTFjeTFqWVRBZUZ3MHlNREV4TVRFeE5UUTJNelJhRncwek1ERXhNRGt4TlRRMgpNelJhTUJNeEVUQVBCZ05WQkFNVENHdGhiWFZ6TFdOaE1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0NBUThBCk1JSUJDZ0tDQVFFQXAxM2xLZWN5YitNYjBEc1ZxcVNrNzlrQ1JWd3BXT0NmQ05yTjRvWUp2anUyZnZjMkpTdzkKcDRQZmVDalNCWlVUbDFDQ2MwazNvQjBtYSs3SHRlWXQ0eEVtcWNPaW8wWjU0YmxnM2hTRjFJMnFRSFQyZGsyNwpZVE9INlpKUGx6YXhQM29lWDNXdS82ZU5aTXdPb1FqM3U2c2VwbWZmajdHMzhCTjJ0SnBXdXlySFYydDhsSFdYCnVhVlMzd3RGSERucllHdVNNa1BFaGw2MktpVmpxcllzTGRzN1k5bThRczdIODU5NXpodUFqY2E4ZDBxdENsL1QKbXROdDFmMndIUjYzNDM2b0lXT09iVjBTRWc1by9XN0NUbE5sREZxK2hTUHIybkRPZVo3KzBmdTNudnRSQzNjQwp0RHQyQ2VCUko0U0xKYi82alAvWmNRbkNBMlFhVkQ5UWJ3SURBUUFCbzBJd1FEQU9CZ05WSFE4QkFmOEVCQU1DCkFxUXdIUVlEVlIwbEJCWXdGQVlJS3dZQkJRVUhBd0VHQ0NzR0FRVUZCd01DTUE4R0ExVWRFd0VCL3dRRk1BTUIKQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFHV1QxOWUvUm1kOGhJMTJMVVFpMEJOMEluQlY1WE5BNUxHMQpQM3RVWHBCQmMzb3Z4c2laVG9PZXpIWjF3WktQakZENVhWL3VnTW5wMXFReDNrWm5uOG9Ybm5MYU1lTTdkNDBoCkZZRFMvQWZIWXk4QTQ2RWxSeThuNUZHY3VMTmo0VU8zdTRsdk9WckpLRFkxT3hGdGREREJjMUZ3UWlEWCsyeSsKUjg1UzFIQWVEOXpFYVg5akgyL3JPREM3clBjVnBoZnVOaUNONTJRY1lNd3daeGYydlBjcnI4cWVFdUJjRDJqSQpsNjNaVEtwNFhZTytkL01pK3FKeTBZYVpMRjRZV1lZRWdZdG1raVRoc2ROQy9SeEtoM2sxZ0ZTU3ZLZGJvcVgvClBnMENQbTh6UjZ5MkRiYjdnTVIwQ24vUDNIT1dEVVRiNWRXTDIwdUZVMG41UzZ6VXRGYz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
lebenitza commented 4 years ago

I think a new issue with the last two comments above needs to be created here: https://github.com/Soluto/helm-charts

shaikatz commented 4 years ago

@wimo7083 Can you please open a new issue for the flux operator related issue so we can be more focused?

@lebenitza Definitely, the auto-generated fields are a topic we can discuss in the helm-charts repo, feel free to open an issue over there.