cloudfoundry / eirini

Pluggable container orchestration for Cloud Foundry, and a Kubernetes backend
Apache License 2.0
115 stars 30 forks source link

Crash events aren't appearing #100

Closed ericpromislow closed 3 years ago

ericpromislow commented 4 years ago

Description

We modified the catnip sample app so when we call it with ENDPOINT/sigterm/KILL the handler calls os.Exit(0). We don't see any crash events

Steps to reproduce

  1. Made these changes in cf-for-k8s:

    diff --git a/build/eirini/eirini-values.yml b/build/eirini/eirini-values.yml
    index a3a529a..ab5175e 100644
    --- a/build/eirini/eirini-values.yml
    +++ b/build/eirini/eirini-values.yml
    @@ -21,7 +21,15 @@ opi:
       caPath: "tls.ca"
    
    events:
    -    enable: false
    +    enable: true
    +    tls:
    +      capiClient:
    +        secretName: "eirini-internal-tls-certs"
    +        keyPath: "tls.key"
    +        certPath: "tls.crt"
    +      capi:
    +        secretName: "eirini-internal-tls-certs"
    +        caPath: "tls.ca"
    
    logs:
     enable: false
    diff --git a/config/_ytt_lib/eirini/rendered.yml b/config/_ytt_lib/eirini/rendered.yml
    index 5ddb993..b80542b 100644
    --- a/config/_ytt_lib/eirini/rendered.yml
    +++ b/config/_ytt_lib/eirini/rendered.yml
    @@ -97,6 +97,41 @@ spec:
    - secret
    - downwardAPI
    ---
    +apiVersion: policy/v1beta1
    +kind: PodSecurityPolicy
    +metadata:
    +  annotations:
    +    seccomp.security.alpha.kubernetes.io/allowedProfileNames: runtime/default
    +    seccomp.security.alpha.kubernetes.io/defaultProfileName: runtime/default
    +  name: eirini-events
    +spec:
    +  allowPrivilegeEscalation: false
    +  fsGroup:
    +    ranges:
    +    - max: 65535
    +      min: 1
    +    rule: MustRunAs
    +  hostIPC: false
    +  hostNetwork: false
    +  hostPID: false
    +  privileged: false
    +  readOnlyRootFilesystem: false
    +  requiredDropCapabilities:
    +  - ALL
    +  runAsUser:
    +    rule: MustRunAsNonRoot
    +  seLinux:
    +    rule: RunAsAny
    +  supplementalGroups:
    +    ranges:
    +    - max: 65535
    +      min: 1
    +    rule: MustRunAs
    +  volumes:
    +  - configMap
    +  - secret
    +  - projected
    +---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    @@ -105,6 +140,12 @@ metadata:
    ---
    apiVersion: v1
    kind: ServiceAccount
    +metadata:
    +  name: eirini-events
    +  namespace: cf-system
    +---
    +apiVersion: v1
    +kind: ServiceAccount
    metadata:
    name: opi
    namespace: cf-system
    @@ -251,6 +292,41 @@ rules:
    - use
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    +kind: Role
    +metadata:
    +  name: eirini-events
    +  namespace: cf-workloads
    +rules:
    +- apiGroups:
    +  - ""
    +  resources:
    +  - pods
    +  verbs:
    +  - list
    +  - watch
    +- apiGroups:
    +  - ""
    +  resources:
    +  - events
    +  verbs:
    +  - list
    +---
    +apiVersion: rbac.authorization.k8s.io/v1
    +kind: Role
    +metadata:
    +  name: eirini-events-psp
    +  namespace: cf-system
    +rules:
    +- apiGroups:
    +  - policy
    +  resourceNames:
    +  - eirini-events
    +  resources:
    +  - podsecuritypolicies
    +  verbs:
    +  - use
    +---
    +apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
    name: cf-workloads-app-rolebinding
    @@ -298,6 +374,34 @@ subjects:
    name: eirini-secret-smuggler
    namespace: cf-system
    ---
    +apiVersion: rbac.authorization.k8s.io/v1
    +kind: RoleBinding
    +metadata:
    +  name: eirini-events
    +  namespace: cf-workloads
    +roleRef:
    +  apiGroup: rbac.authorization.k8s.io
    +  kind: Role
    +  name: eirini-events
    +subjects:
    +- kind: ServiceAccount
    +  name: eirini-events
    +  namespace: cf-system
    +---
    +apiVersion: rbac.authorization.k8s.io/v1
    +kind: RoleBinding
    +metadata:
    +  name: eirini-events-psp
    +  namespace: cf-system
    +roleRef:
    +  apiGroup: rbac.authorization.k8s.io
    +  kind: Role
    +  name: eirini-events-psp
    +subjects:
    +- kind: ServiceAccount
    +  name: eirini-events
    +  namespace: cf-system
    +---
    apiVersion: v1
    kind: Service
    metadata:
    @@ -384,3 +488,61 @@ spec:
               - key: tls.ca
                 path: eirini.ca
               name: eirini-internal-tls-certs
    +---
    +apiVersion: apps/v1
    +kind: Deployment
    +metadata:
    +  annotations:
    +    kbld.k14s.io/images: |
    +      - Metas: null
    +        URL: index.docker.io/eirini/event-reporter@sha256:a1c6d5dfe8961856d09a6f32169a2162e8b9b3e2b492c980183aa1fd8d064129
    +  name: eirini-events
    +  namespace: cf-system
    +spec:
    +  selector:
    +    matchLabels:
    +      name: eirini-events
    +  template:
    +    metadata:
    +      labels:
    +        name: eirini-events
    +    spec:
    +      containers:
    +      - image: index.docker.io/eirini/event-reporter@sha256:a1c6d5dfe8961856d09a6f32169a2162e8b9b3e2b492c980183aa1fd8d064129
    +        imagePullPolicy: Always
    +        name: event-reporter
    +        resources:
    +          requests:
    +            cpu: 15m
    +            memory: 15Mi
    +        volumeMounts:
    +        - mountPath: /etc/eirini/config
    +          name: config-map-volume
    +        - mountPath: /etc/eirini/secrets
    +          name: cf-secrets
    +      dnsPolicy: ClusterFirst
    +      securityContext:
    +        runAsNonRoot: true
    +      serviceAccountName: eirini-events
    +      volumes:
    +      - configMap:
    +          items:
    +          - key: events.yml
    +            path: events.yml
    +          name: eirini
    +        name: config-map-volume
    +      - name: cf-secrets
    +        projected:
    +          sources:
    +          - secret:
    +              items:
    +              - key: tls.crt
    +                path: cc.crt
    +              - key: tls.key
    +                path: cc.key
    +              name: eirini-internal-tls-certs
    +          - secret:
    +              items:
    +              - key: tls.ca
    +                path: cc.ca
    +              name: eirini-internal-tls-certs

and in cf-acceptance-tests:

diff --git a/assets/catnip/signal/signal.go b/assets/catnip/signal/signal.go
index 5fc05bf4..d0d611aa 100644
--- a/assets/catnip/signal/signal.go
+++ b/assets/catnip/signal/signal.go
@@ -1,11 +1,23 @@
 package signal

 import (
+   "fmt"
+   "io"
    "net/http"
    "os"
 )

 func KillHandler(res http.ResponseWriter, req *http.Request) {
-   currentProcess, _ := os.FindProcess(os.Getpid())
+   pid := os.Getpid()
+   fmt.Fprintf(os.Stdout, "About to kill process %d\n", pid)
+   io.WriteString(res, fmt.Sprintf("About to kill process %d\n", pid))
+   currentProcess, _ := os.FindProcess(pid)
    currentProcess.Kill()
+   fmt.Fprintf(os.Stdout, "Did kill process %d\n", pid)
+   io.WriteString(res, fmt.Sprintf("Did kill process %d\n", pid))
+   fmt.Fprintf(os.Stdout, "About to exit process %d\n", pid)
+   io.WriteString(res, "About to exit...")
+   os.Exit(0)
+   fmt.Fprintf(os.Stdout, "Did exit process %d\n", pid)
+   io.WriteString(res, "Did exit")
 }
  1. Ran kapp deploy -a cf <(ytt -f config -f $values_file) -y

  2. z catnip

  3. cf push catnip

  4. curl catnip.DOMAIN/sigterm/KILL

  5. cf events catnip

What was expected to happen

I was expecting to see some events that contained the string audit.apps.process.crash

What actually happened

Getting events for app catnip in org org / space space as admin...

time                          event                      actor             description
2020-06-25T16:32:37.00-0700   audit.app.droplet.create   admin@admin.tld
2020-06-25T16:31:56.00-0700   audit.app.update           admin@admin.tld   state: STARTED
2020-06-25T16:31:56.00-0700   audit.app.build.create     admin@admin.tld
2020-06-25T16:31:56.00-0700   audit.app.update           admin@admin.tld   state: STOPPED
2020-06-25T16:31:49.00-0700   audit.app.upload-bits      admin@admin.tld
2020-06-25T16:31:48.00-0700   audit.app.update           admin@admin.tld   disk_quota: 1024, instances: 1, memory: 1024
2020-06-25T16:29:51.00-0700   audit.app.droplet.create   admin@admin.tld
2020-06-25T16:29:08.00-0700   audit.app.update           admin@admin.tld   state: STARTED
2020-06-25T16:29:08.00-0700   audit.app.build.create     admin@admin.tld
2020-06-25T16:29:07.00-0700   audit.app.update           admin@admin.tld   state: STOPPED
2020-06-25T16:29:01.00-0700   audit.app.upload-bits      admin@admin.tld
2020-06-25T16:28:58.00-0700   audit.app.update           admin@admin.tld   disk_quota: 1024, instances: 1, memory: 1024
2020-06-25T16:18:40.00-0700   audit.app.droplet.create   admin@admin.tld
2020-06-25T16:18:03.00-0700   audit.app.update           admin@admin.tld   state: STARTED
2020-06-25T16:18:03.00-0700   audit.app.build.create     admin@admin.tld
2020-06-25T16:17:56.00-0700   audit.app.upload-bits      admin@admin.tld
2020-06-25T16:17:53.00-0700   audit.app.map-route        admin@admin.tld
2020-06-25T16:17:53.00-0700   audit.app.create           admin@admin.tld   instances: 1, state: STOPPED, environment_json: [PRIVATE DATA HIDDEN]

Additional information (optional)

The catnip app without os.Exit(1) doesn't seem to actually exit.

cf-gitbot commented 4 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/173537284

The labels on this github issue will be updated when the story is started.

kieron-dev commented 4 years ago

Hi @ericpromislow

We've seen that eirini helm templating sets cc_internal_api in the events section of the config map to https://{{ .Values.opi.cc_api.serviceName }}.{{ .Release.Namespace }}.svc.cluster.local:9023.

There are couple of problems there:

The easiest change is to modify that property in the config map to something like http://capi.cf-system.svc.cluster.local which will stop the crash reporter updates failing.

In the meantime, we'll consider how to make the templating in the configmap more flexible, although note that we're dropping helm templating in the near future, just providing example deployment files and docs, in which case changing your configmap templating is the correct fix.

ericpromislow commented 4 years ago

Separate PR dealing with permissions to access events done in https://github.com/cloudfoundry-incubator/eirini-release/pull/174