IBM / core-dump-handler

Save core dumps from a Kubernetes Service or RedHat OpenShift to an S3 protocol compatible object store
https://ibm.github.io/core-dump-handler/
MIT License
136 stars 40 forks source link

OpenShift compatibility #44

Closed tjungblu closed 2 years ago

tjungblu commented 2 years ago

Hey,

I'm currently looking into this project for OpenShift, you mention for ARO/ROSA:

building compatible binaries seems to be the next step

what is missing to get this to work with RHCOS?

Cheers, Thomas

No9 commented 2 years ago

Hey @tjungblu To be honest I'm not 100% what's required for RHCOS. I ran a couple of very quick tests on ARO/ROSA and they didn't work. Specifically the agent that uploads the core-dumps worked as expected but the cores weren't collected by the exe on the host.

Some aspects I was going to investigate further:

The core-dump-composer is copied to a shared host location during the deployment. Is this actually supported by CoreOS?

a) If copying the binary to the host is supported does it require compatible builds or will RHEL ubi7/8 builds work ok? [edit] It may just require copying to a specific write location which can be changed with the daemonset.hostDirectory and daemonset.coreDirectory options.

b) If copying the file to the host isn't supported the best practices for supplying host services needs to be researched. From initial reading it seems as though host services are provided as containers and these need to be defined at cluster creation time but I may have that totally wrong.

I don't think the issues are ARO/ROSA specific and my next step was going to be to setup a local cluster and investigate further but I haven't had time to get around to it.

I'm also conscious that the way this project works might impact how Red Hat provides general support for aborting processes so that's another area that will need clarification.

[edit] Kind of related it would be useful to confirm if the manual scc config is required as I hit this bug when a while back and haven't gone back to revalidate https://github.com/openshift/origin/issues/20788

Sorry to give you more questions than answers but that's the status of where I got too.

Any help would be greatly appreciated.

travier commented 2 years ago

If I understand correctly, this program sets itself up to handle coredumps from the kernel, taking over systemd-coredump. As the binary is launched directly by the kernel, the easiest option is to place it directly on the host in /usr/local/bin for example which is a writable path on RHCOS. Then the agent can run as a container with access to the path where the dumps are stored.

RHCOS is based on RHEL so building binaries in UBI 7/8 will give compatible binaries for RHCOS.

Another option is to have the agent directly talk to systemd-coredump via its socket (/run/systemd/coredump) and process the results. That would requires less kernel configuration.

tjungblu commented 2 years ago

Thanks for your help here @travier, much appreciated. I just had some time and tried this out with our cluster-bot installing 4.9 on AWS directly (Red Hat Enterprise Linux CoreOS 49.84.202201042103-0) and it works flawlessly with just these adjustments:

in values.yaml hostDirectory: "/mnt/core-dump-handler" coreDirectory: "/mnt/core-dump-handler/cores"

and the scc grant:

oc adm policy add-scc-to-user privileged -z core-dump-admin -n observe

image

do you think we should add a switch to helm to make it work on open shift? I haven't tried the add the scc change into the chart yet, but I'm sure we can also fix this somehow. Otherwise we can just use a post-installation hook job that runs the OC command.

No9 commented 2 years ago

@tjungblu can you confirm the contents of the zip - there should be 7 files in there. Just want to confirm that crictl is being picked up properly Or did you use https://github.com/IBM/core-dump-handler/blob/main/integration/run.sh with a .env file set up in the root of the project?

tjungblu commented 2 years ago

@No9 interesting, I got only 5:

-r--r--r--. 1 tjungblu tjungblu 229376 Jan  7 10:53 6327835c-532b-447b-98be-40dfb46bb130-dump-1641552839-segfaulter-segfaulter-1-4.core
-r--r--r--. 1 tjungblu tjungblu    293 Jan  7 10:53 6327835c-532b-447b-98be-40dfb46bb130-dump-1641552839-segfaulter-segfaulter-1-4-dump-info.json
-r--r--r--. 1 tjungblu tjungblu    596 Jan  7 10:53 6327835c-532b-447b-98be-40dfb46bb130-dump-1641552839-segfaulter-segfaulter-1-4-pod-info.json
-r--r--r--. 1 tjungblu tjungblu    996 Jan  7 10:53 6327835c-532b-447b-98be-40dfb46bb130-dump-1641552839-segfaulter-segfaulter-1-4-ps-info.json
-r--r--r--. 1 tjungblu tjungblu  27059 Jan  7 10:53 6327835c-532b-447b-98be-40dfb46bb130-dump-1641552839-segfaulter-segfaulter-1-4-runtime-info.json

The coredump looks fine though with objdump (as far as I can tell)

I ran the segfaulter directly after installing the helm chart (no .env file setup):

kubectl run -it segfaulter --image=quay.io/icdh/segfaulter --restart=Never

No9 commented 2 years ago

OK the good news is it looks like crictl is there and functioning but you are missing the image info. Can you redeploy with the chart adding --set daemonset.composerCrioImageCmd="images" and rerun the coredump.

[Edit ]@tjungblu To be clear the test includes the original zip file so it's just the image file that's missing.

tjungblu commented 2 years ago

yep, that's now on the daemonset:

COMP_CRIO_IMAGE_CMD = images

and in the logs

[2022-01-07T11:27:02Z INFO  core_dump_agent] Creating /mnt/core-dump-handler/.env file with LOG_LEVEL=Warn
[2022-01-07T11:27:02Z INFO  core_dump_agent] Writing composer .env
    LOG_LEVEL=Warn
    IGNORE_CRIO=false
    CRIO_IMAGE_CMD=images
    USE_CRIO_CONF=false

[2022-01-07T11:27:02Z INFO  core_dump_agent] Executing Agent with location : /mnt/core-dump-handler/cores

I'm afraid the image file isn't there however:

Archive:  aa08a83e-5b13-4a32-9ed4-e8670d738d83-dump-1641554896-segfaulter-segfaulter-1-4.zip
  inflating: aa08a83e-5b13-4a32-9ed4-e8670d738d83-dump-1641554896-segfaulter-segfaulter-1-4-dump-info.json  
  inflating: aa08a83e-5b13-4a32-9ed4-e8670d738d83-dump-1641554896-segfaulter-segfaulter-1-4.core  
  inflating: aa08a83e-5b13-4a32-9ed4-e8670d738d83-dump-1641554896-segfaulter-segfaulter-1-4-pod-info.json  
  inflating: aa08a83e-5b13-4a32-9ed4-e8670d738d83-dump-1641554896-segfaulter-segfaulter-1-4-runtime-info.json  
  inflating: aa08a83e-5b13-4a32-9ed4-e8670d738d83-dump-1641554896-segfaulter-segfaulter-1-4-ps-info.json  
No9 commented 2 years ago

OK in the pod can you run cat /mnt/core-dump-handler/composer.log That is the log for the composer - hopefully there is a clue there If there isn't can we can redeploy with --set daemonset.composerLogLevel="Debug" to get verbose output.

tjungblu commented 2 years ago

debug did the trick, here's the log output:

sh-4.4# cat /mnt/core-dump-handler/composer.log
INFO - 2022-01-07T11:44:48.149442800+00:00 - Loading .env
INFO - 2022-01-07T11:44:48.149491609+00:00 - Set logfile to: "/var/mnt/core-dump-handler/composer.log"
DEBUG - 2022-01-07T11:44:48.149576456+00:00 - Creating dump for 39817ae8-aa0e-4887-b22d-a20d3f3deb7e-dump-1641555888-segfaulter-segfaulter-1-4
INFO - 2022-01-07T11:44:48.152804767+00:00 - Running crictl ["pods", "--name", "segfaulter", "-o", "json"]
DEBUG - 2022-01-07T11:44:48.173483374+00:00 - Using runtime_file_name:39817ae8-aa0e-4887-b22d-a20d3f3deb7e-dump-1641555888-segfaulter-segfaulter-1-4-pod-info.json
DEBUG - 2022-01-07T11:44:48.173901132+00:00 - pod object {"items":[{"annotations":{"kubernetes.io/config.seen":"2022-01-07T11:44:46.300339165Z","kubernetes.io/config.source":"api"},"createdAt":"1641555886634313048","id":"be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93","labels":{"io.kubernetes.container.name":"POD","io.kubernetes.pod.name":"segfaulter","io.kubernetes.pod.namespace":"default","io.kubernetes.pod.uid":"6a52412a-0a32-425f-830f-87a95abb57d4","run":"segfaulter"},"metadata":{"attempt":0,"name":"segfaulter","namespace":"default","uid":"6a52412a-0a32-425f-830f-87a95abb57d4"},"runtimeHandler":"","state":"SANDBOX_READY"}]}
DEBUG - 2022-01-07T11:44:48.173929712+00:00 - Using pod_id:be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93
INFO - 2022-01-07T11:44:48.173933462+00:00 - Running crictl ["inspectp", "be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93"]
DEBUG - 2022-01-07T11:44:48.195405333+00:00 - inspectp_output status: exit status: 0
DEBUG - 2022-01-07T11:44:48.195432985+00:00 - inspectp_output stderr, 
DEBUG - 2022-01-07T11:44:48.195438941+00:00 - Using runtime_file_name:39817ae8-aa0e-4887-b22d-a20d3f3deb7e-dump-1641555888-segfaulter-segfaulter-1-4-runtime-info.json
DEBUG - 2022-01-07T11:44:48.195687868+00:00 - inspectp_output: {
  "status": {
    "id": "be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93",
    "metadata": {
      "attempt": 0,
      "name": "segfaulter",
      "namespace": "default",
      "uid": "6a52412a-0a32-425f-830f-87a95abb57d4"
    },
    "state": "SANDBOX_READY",
    "createdAt": "2022-01-07T11:44:46.634313048Z",
    "network": {
      "additionalIps": [],
      "ip": "10.128.2.16"
    },
    "linux": {
      "namespaces": {
        "options": {
          "ipc": "POD",
          "network": "POD",
          "pid": "CONTAINER",
          "targetId": ""
        }
      }
    },
    "labels": {
      "io.kubernetes.container.name": "POD",
      "io.kubernetes.pod.name": "segfaulter",
      "io.kubernetes.pod.namespace": "default",
      "io.kubernetes.pod.uid": "6a52412a-0a32-425f-830f-87a95abb57d4",
      "run": "segfaulter"
    },
    "annotations": {
      "kubernetes.io/config.seen": "2022-01-07T11:44:46.300339165Z",
      "kubernetes.io/config.source": "api"
    },
    "runtimeHandler": ""
  },
  "info": {
    "runtimeSpec": {
      "ociVersion": "1.0.2-dev",
      "process": {
        "user": {
          "uid": 0,
          "gid": 0
        },
        "args": [
          "/usr/bin/pod"
        ],
        "env": [
          "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
          "TERM=xterm"
        ],
        "cwd": "/",
        "capabilities": {
          "bounding": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_KILL"
          ],
          "effective": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_KILL"
          ],
          "inheritable": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_KILL"
          ],
          "permitted": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_KILL"
          ]
        },
        "oomScoreAdj": -998,
        "selinuxLabel": "system_u:system_r:container_t:s0:c465,c727"
      },
      "root": {
        "path": "/var/lib/containers/storage/overlay/62ed5caf9aaf57b459b5ef9ad219519a714e78a9ab11f95f98266af70d389c48/merged",
        "readonly": true
      },
      "hostname": "segfaulter",
      "mounts": [
        {
          "destination": "/proc",
          "type": "proc",
          "source": "proc",
          "options": [
            "nosuid",
            "noexec",
            "nodev"
          ]
        },
        {
          "destination": "/dev",
          "type": "tmpfs",
          "source": "tmpfs",
          "options": [
            "nosuid",
            "strictatime",
            "mode=755",
            "size=65536k"
          ]
        },
        {
          "destination": "/dev/pts",
          "type": "devpts",
          "source": "devpts",
          "options": [
            "nosuid",
            "noexec",
            "newinstance",
            "ptmxmode=0666",
            "mode=0620",
            "gid=5"
          ]
        },
        {
          "destination": "/dev/mqueue",
          "type": "mqueue",
          "source": "mqueue",
          "options": [
            "nosuid",
            "noexec",
            "nodev"
          ]
        },
        {
          "destination": "/sys",
          "type": "sysfs",
          "source": "sysfs",
          "options": [
            "nosuid",
            "noexec",
            "nodev",
            "ro"
          ]
        },
        {
          "destination": "/etc/resolv.conf",
          "type": "bind",
          "source": "/run/containers/storage/overlay-containers/be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93/userdata/resolv.conf",
          "options": [
            "ro",
            "bind",
            "nodev",
            "nosuid",
            "noexec"
          ]
        },
        {
          "destination": "/dev/shm",
          "type": "bind",
          "source": "/run/containers/storage/overlay-containers/be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93/userdata/shm",
          "options": [
            "rw",
            "bind"
          ]
        },
        {
          "destination": "/etc/hostname",
          "type": "bind",
          "source": "/run/containers/storage/overlay-containers/be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93/userdata/hostname",
          "options": [
            "ro",
            "bind",
            "nodev",
            "nosuid",
            "noexec"
          ]
        }
      ],
      "annotations": {
        "io.kubernetes.pod.uid": "6a52412a-0a32-425f-830f-87a95abb57d4",
        "io.kubernetes.cri-o.Image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c56c86d030185bda241514593970e80f75ae75afd9bc6288388944bc2a1dfb1f",
        "io.kubernetes.cri-o.ShmPath": "/run/containers/storage/overlay-containers/be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93/userdata/shm",
        "io.kubernetes.cri-o.PortMappings": "[]",
        "run": "segfaulter",
        "io.kubernetes.cri-o.SeccompProfilePath": "runtime/default",
        "io.kubernetes.cri-o.NamespaceOptions": "{\"pid\":1}",
        "io.container.manager": "cri-o",
        "org.systemd.property.CollectMode": "'inactive-or-failed'",
        "kubernetes.io/config.seen": "2022-01-07T11:44:46.300339165Z",
        "io.kubernetes.cri-o.RuntimeHandler": "",
        "io.kubernetes.cri-o.ResolvPath": "/run/containers/storage/overlay-containers/be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93/userdata/resolv.conf",
        "io.kubernetes.cri-o.CgroupParent": "kubepods-besteffort-pod6a52412a_0a32_425f_830f_87a95abb57d4.slice",
        "io.kubernetes.pod.name": "segfaulter",
        "io.kubernetes.cri-o.MountPoint": "/var/lib/containers/storage/overlay/62ed5caf9aaf57b459b5ef9ad219519a714e78a9ab11f95f98266af70d389c48/merged",
        "io.kubernetes.cri-o.HostnamePath": "/run/containers/storage/overlay-containers/be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93/userdata/hostname",
        "io.kubernetes.cri-o.IP.0": "10.128.2.16",
        "io.kubernetes.cri-o.SandboxID": "be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93",
        "io.kubernetes.cri-o.Created": "2022-01-07T11:44:46.634313048Z",
        "io.kubernetes.pod.namespace": "default",
        "io.kubernetes.cri-o.Spoofed": "true",
        "io.kubernetes.cri-o.Annotations": "{\"kubernetes.io/config.seen\":\"2022-01-07T11:44:46.300339165Z\",\"kubernetes.io/config.source\":\"api\"}",
        "io.kubernetes.cri-o.ContainerID": "be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93",
        "io.kubernetes.cri-o.CNIResult": "{\"cniVersion\":\"0.4.0\",\"interfaces\":[{\"name\":\"eth0\",\"sandbox\":\"/var/run/netns/d8ad3ec9-82db-4993-99b0-413bd1235127\"}],\"ips\":[{\"version\":\"4\",\"interface\":0,\"address\":\"10.128.2.16/23\"}],\"routes\":[{\"dst\":\"0.0.0.0/0\",\"gw\":\"10.128.2.1\"},{\"dst\":\"224.0.0.0/4\"},{\"dst\":\"10.128.0.0/14\"}],\"dns\":{}}",
        "io.kubernetes.cri-o.HostNetwork": "false",
        "kubernetes.io/config.source": "api",
        "io.kubernetes.container.name": "POD",
        "io.kubernetes.cri-o.Metadata": "{\"Name\":\"segfaulter\",\"UID\":\"6a52412a-0a32-425f-830f-87a95abb57d4\",\"Namespace\":\"default\",\"Attempt\":0}",
        "io.kubernetes.cri-o.Name": "k8s_segfaulter_default_6a52412a-0a32-425f-830f-87a95abb57d4_0",
        "io.kubernetes.cri-o.PrivilegedRuntime": "false",
        "io.kubernetes.cri-o.ContainerType": "sandbox",
        "io.kubernetes.cri-o.ContainerName": "k8s_POD_segfaulter_default_6a52412a-0a32-425f-830f-87a95abb57d4_0",
        "io.kubernetes.cri-o.HostName": "segfaulter",
        "io.kubernetes.cri-o.KubeName": "segfaulter",
        "io.kubernetes.cri-o.Labels": "{\"io.kubernetes.container.name\":\"POD\",\"io.kubernetes.pod.uid\":\"6a52412a-0a32-425f-830f-87a95abb57d4\",\"io.kubernetes.pod.namespace\":\"default\",\"io.kubernetes.pod.name\":\"segfaulter\",\"run\":\"segfaulter\"}",
        "io.kubernetes.cri-o.LogPath": "/var/log/pods/default_segfaulter_6a52412a-0a32-425f-830f-87a95abb57d4/be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93.log",
        "io.kubernetes.cri-o.Namespace": "default"
      },
      "linux": {
        "sysctl": {
          "net.ipv4.ping_group_range": "0 2147483647"
        },
        "resources": {
          "devices": [
            {
              "allow": false,
              "access": "rwm"
            }
          ],
          "cpu": {
            "shares": 2
          }
        },
        "cgroupsPath": "kubepods-besteffort-pod6a52412a_0a32_425f_830f_87a95abb57d4.slice:crio:be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93",
        "namespaces": [
          {
            "type": "pid"
          },
          {
            "type": "network",
            "path": "/var/run/netns/d8ad3ec9-82db-4993-99b0-413bd1235127"
          },
          {
            "type": "ipc",
            "path": "/var/run/ipcns/d8ad3ec9-82db-4993-99b0-413bd1235127"
          },
          {
            "type": "uts",
            "path": "/var/run/utsns/d8ad3ec9-82db-4993-99b0-413bd1235127"
          },
          {
            "type": "mount"
          }
        ],
        "seccomp": {
          "defaultAction": "SCMP_ACT_ERRNO",
          "defaultErrnoRet": 38,
          "architectures": [
            "SCMP_ARCH_X86_64",
            "SCMP_ARCH_X86",
            "SCMP_ARCH_X32"
          ],
          "syscalls": [
            {
              "names": [
                "bdflush",
                "io_pgetevents",
                "kexec_file_load",
                "kexec_load",
                "migrate_pages",
                "move_pages",
                "nfsservctl",
                "nice",
                "oldfstat",
                "oldlstat",
                "oldolduname",
                "oldstat",
                "olduname",
                "pciconfig_iobase",
                "pciconfig_read",
                "pciconfig_write",
                "sgetmask",
                "ssetmask",
                "swapcontext",
                "swapoff",
                "swapon",
                "sysfs",
                "uselib",
                "userfaultfd",
                "ustat",
                "vm86",
                "vm86old",
                "vmsplice"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "_llseek",
                "_newselect",
                "accept",
                "accept4",
                "access",
                "adjtimex",
                "alarm",
                "bind",
                "brk",
                "capget",
                "capset",
                "chdir",
                "chmod",
                "chown",
                "chown32",
                "clock_adjtime",
                "clock_adjtime64",
                "clock_getres",
                "clock_getres_time64",
                "clock_gettime",
                "clock_gettime64",
                "clock_nanosleep",
                "clock_nanosleep_time64",
                "clone",
                "clone3",
                "close",
                "close_range",
                "connect",
                "copy_file_range",
                "creat",
                "dup",
                "dup2",
                "dup3",
                "epoll_create",
                "epoll_create1",
                "epoll_ctl",
                "epoll_ctl_old",
                "epoll_pwait",
                "epoll_pwait2",
                "epoll_wait",
                "epoll_wait_old",
                "eventfd",
                "eventfd2",
                "execve",
                "execveat",
                "exit",
                "exit_group",
                "faccessat",
                "faccessat2",
                "fadvise64",
                "fadvise64_64",
                "fallocate",
                "fanotify_mark",
                "fchdir",
                "fchmod",
                "fchmodat",
                "fchown",
                "fchown32",
                "fchownat",
                "fcntl",
                "fcntl64",
                "fdatasync",
                "fgetxattr",
                "flistxattr",
                "flock",
                "fork",
                "fremovexattr",
                "fsconfig",
                "fsetxattr",
                "fsmount",
                "fsopen",
                "fspick",
                "fstat",
                "fstat64",
                "fstatat64",
                "fstatfs",
                "fstatfs64",
                "fsync",
                "ftruncate",
                "ftruncate64",
                "futex",
                "futex_time64",
                "futimesat",
                "get_robust_list",
                "get_thread_area",
                "getcpu",
                "getcwd",
                "getdents",
                "getdents64",
                "getegid",
                "getegid32",
                "geteuid",
                "geteuid32",
                "getgid",
                "getgid32",
                "getgroups",
                "getgroups32",
                "getitimer",
                "get_mempolicy",
                "getpeername",
                "getpgid",
                "getpgrp",
                "getpid",
                "getppid",
                "getpriority",
                "getrandom",
                "getresgid",
                "getresgid32",
                "getresuid",
                "getresuid32",
                "getrlimit",
                "getrusage",
                "getsid",
                "getsockname",
                "getsockopt",
                "gettid",
                "gettimeofday",
                "getuid",
                "getuid32",
                "getxattr",
                "inotify_add_watch",
                "inotify_init",
                "inotify_init1",
                "inotify_rm_watch",
                "io_cancel",
                "io_destroy",
                "io_getevents",
                "io_setup",
                "io_submit",
                "ioctl",
                "ioprio_get",
                "ioprio_set",
                "ipc",
                "keyctl",
                "kill",
                "lchown",
                "lchown32",
                "lgetxattr",
                "link",
                "linkat",
                "listen",
                "listxattr",
                "llistxattr",
                "lremovexattr",
                "lseek",
                "lsetxattr",
                "lstat",
                "lstat64",
                "madvise",
                "mbind",
                "memfd_create",
                "mincore",
                "mkdir",
                "mkdirat",
                "mknod",
                "mknodat",
                "mlock",
                "mlock2",
                "mlockall",
                "mmap",
                "mmap2",
                "mount",
                "move_mount",
                "mprotect",
                "mq_getsetattr",
                "mq_notify",
                "mq_open",
                "mq_timedreceive",
                "mq_timedreceive_time64",
                "mq_timedsend",
                "mq_timedsend_time64",
                "mq_unlink",
                "mremap",
                "msgctl",
                "msgget",
                "msgrcv",
                "msgsnd",
                "msync",
                "munlock",
                "munlockall",
                "munmap",
                "name_to_handle_at",
                "nanosleep",
                "newfstatat",
                "open",
                "openat",
                "openat2",
                "open_tree",
                "pause",
                "pidfd_getfd",
                "pidfd_open",
                "pidfd_send_signal",
                "pipe",
                "pipe2",
                "pivot_root",
                "pkey_alloc",
                "pkey_free",
                "pkey_mprotect",
                "poll",
                "ppoll",
                "ppoll_time64",
                "prctl",
                "pread64",
                "preadv",
                "preadv2",
                "prlimit64",
                "pselect6",
                "pselect6_time64",
                "pwrite64",
                "pwritev",
                "pwritev2",
                "read",
                "readahead",
                "readdir",
                "readlink",
                "readlinkat",
                "readv",
                "reboot",
                "recv",
                "recvfrom",
                "recvmmsg",
                "recvmmsg_time64",
                "recvmsg",
                "remap_file_pages",
                "removexattr",
                "rename",
                "renameat",
                "renameat2",
                "restart_syscall",
                "rmdir",
                "rseq",
                "rt_sigaction",
                "rt_sigpending",
                "rt_sigprocmask",
                "rt_sigqueueinfo",
                "rt_sigreturn",
                "rt_sigsuspend",
                "rt_sigtimedwait",
                "rt_sigtimedwait_time64",
                "rt_tgsigqueueinfo",
                "sched_get_priority_max",
                "sched_get_priority_min",
                "sched_getaffinity",
                "sched_getattr",
                "sched_getparam",
                "sched_getscheduler",
                "sched_rr_get_interval",
                "sched_rr_get_interval_time64",
                "sched_setaffinity",
                "sched_setattr",
                "sched_setparam",
                "sched_setscheduler",
                "sched_yield",
                "seccomp",
                "select",
                "semctl",
                "semget",
                "semop",
                "semtimedop",
                "semtimedop_time64",
                "send",
                "sendfile",
                "sendfile64",
                "sendmmsg",
                "sendmsg",
                "sendto",
                "setns",
                "set_mempolicy",
                "set_robust_list",
                "set_thread_area",
                "set_tid_address",
                "setfsgid",
                "setfsgid32",
                "setfsuid",
                "setfsuid32",
                "setgid",
                "setgid32",
                "setgroups",
                "setgroups32",
                "setitimer",
                "setpgid",
                "setpriority",
                "setregid",
                "setregid32",
                "setresgid",
                "setresgid32",
                "setresuid",
                "setresuid32",
                "setreuid",
                "setreuid32",
                "setrlimit",
                "setsid",
                "setsockopt",
                "setuid",
                "setuid32",
                "setxattr",
                "shmat",
                "shmctl",
                "shmdt",
                "shmget",
                "shutdown",
                "sigaltstack",
                "signalfd",
                "signalfd4",
                "sigreturn",
                "socketcall",
                "socketpair",
                "splice",
                "stat",
                "stat64",
                "statfs",
                "statfs64",
                "statx",
                "symlink",
                "symlinkat",
                "sync",
                "sync_file_range",
                "syncfs",
                "sysinfo",
                "syslog",
                "tee",
                "tgkill",
                "time",
                "timer_create",
                "timer_delete",
                "timer_getoverrun",
                "timer_gettime",
                "timer_gettime64",
                "timer_settime",
                "timer_settime64",
                "timerfd_create",
                "timerfd_gettime",
                "timerfd_gettime64",
                "timerfd_settime",
                "timerfd_settime64",
                "times",
                "tkill",
                "truncate",
                "truncate64",
                "ugetrlimit",
                "umask",
                "umount",
                "umount2",
                "uname",
                "unlink",
                "unlinkat",
                "unshare",
                "utime",
                "utimensat",
                "utimensat_time64",
                "utimes",
                "vfork",
                "wait4",
                "waitid",
                "waitpid",
                "write",
                "writev"
              ],
              "action": "SCMP_ACT_ALLOW"
            },
            {
              "names": [
                "personality"
              ],
              "action": "SCMP_ACT_ALLOW",
              "args": [
                {
                  "index": 0,
                  "value": 0,
                  "op": "SCMP_CMP_EQ"
                }
              ]
            },
            {
              "names": [
                "personality"
              ],
              "action": "SCMP_ACT_ALLOW",
              "args": [
                {
                  "index": 0,
                  "value": 8,
                  "op": "SCMP_CMP_EQ"
                }
              ]
            },
            {
              "names": [
                "personality"
              ],
              "action": "SCMP_ACT_ALLOW",
              "args": [
                {
                  "index": 0,
                  "value": 131072,
                  "op": "SCMP_CMP_EQ"
                }
              ]
            },
            {
              "names": [
                "personality"
              ],
              "action": "SCMP_ACT_ALLOW",
              "args": [
                {
                  "index": 0,
                  "value": 131080,
                  "op": "SCMP_CMP_EQ"
                }
              ]
            },
            {
              "names": [
                "personality"
              ],
              "action": "SCMP_ACT_ALLOW",
              "args": [
                {
                  "index": 0,
                  "value": 4294967295,
                  "op": "SCMP_CMP_EQ"
                }
              ]
            },
            {
              "names": [
                "arch_prctl"
              ],
              "action": "SCMP_ACT_ALLOW"
            },
            {
              "names": [
                "modify_ldt"
              ],
              "action": "SCMP_ACT_ALLOW"
            },
            {
              "names": [
                "open_by_handle_at"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "bpf",
                "fanotify_init",
                "lookup_dcookie",
                "perf_event_open",
                "quotactl",
                "setdomainname",
                "sethostname",
                "setns"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "chroot"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "delete_module",
                "init_module",
                "finit_module",
                "query_module"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "acct"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "kcmp",
                "process_madvise",
                "process_vm_readv",
                "process_vm_writev",
                "ptrace"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "iopl",
                "ioperm"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "settimeofday",
                "stime",
                "clock_settime",
                "clock_settime64"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "vhangup"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 1
            },
            {
              "names": [
                "socket"
              ],
              "action": "SCMP_ACT_ERRNO",
              "errnoRet": 22,
              "args": [
                {
                  "index": 0,
                  "value": 16,
                  "op": "SCMP_CMP_EQ"
                },
                {
                  "index": 2,
                  "value": 9,
                  "op": "SCMP_CMP_EQ"
                }
              ]
            },
            {
              "names": [
                "socket"
              ],
              "action": "SCMP_ACT_ALLOW",
              "args": [
                {
                  "index": 2,
                  "value": 9,
                  "op": "SCMP_CMP_NE"
                }
              ]
            },
            {
              "names": [
                "socket"
              ],
              "action": "SCMP_ACT_ALLOW",
              "args": [
                {
                  "index": 0,
                  "value": 16,
                  "op": "SCMP_CMP_NE"
                }
              ]
            },
            {
              "names": [
                "socket"
              ],
              "action": "SCMP_ACT_ALLOW",
              "args": [
                {
                  "index": 2,
                  "value": 9,
                  "op": "SCMP_CMP_NE"
                }
              ]
            }
          ]
        },
        "mountLabel": "system_u:object_r:container_file_t:s0:c465,c727"
      }
    }
  }
}

INFO - 2022-01-07T11:44:48.196130494+00:00 - Running crictl ["ps", "-o", "json", "-p", "be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93"]
DEBUG - 2022-01-07T11:44:48.216846591+00:00 - ps_output status: exit status: 0
DEBUG - 2022-01-07T11:44:48.216880171+00:00 - ps_output stderr, 
DEBUG - 2022-01-07T11:44:48.216886910+00:00 - ps_output: {
  "containers": [
    {
      "id": "d521d46f5a54477537feaa4c27e13a30655aeb4c4e81439f7d0afc45baf6a43d",
      "podSandboxId": "be1703e196f6449e5a42ee2eab0bf1e05f97a3dc5115325aea4a3ffb14919a93",
      "metadata": {
        "name": "segfaulter",
        "attempt": 0
      },
      "image": {
        "image": "quay.io/icdh/segfaulter@sha256:0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd",
        "annotations": {
        }
      },
      "imageRef": "quay.io/icdh/segfaulter@sha256:0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd",
      "state": "CONTAINER_RUNNING",
      "createdAt": "1641555888115767207",
      "labels": {
        "io.kubernetes.container.name": "segfaulter",
        "io.kubernetes.pod.name": "segfaulter",
        "io.kubernetes.pod.namespace": "default",
        "io.kubernetes.pod.uid": "6a52412a-0a32-425f-830f-87a95abb57d4"
      },
      "annotations": {
        "io.kubernetes.container.hash": "1b45ece1",
        "io.kubernetes.container.restartCount": "0",
        "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
        "io.kubernetes.container.terminationMessagePolicy": "File",
        "io.kubernetes.pod.terminationGracePeriod": "30"
      }
    }
  ]
}

DEBUG - 2022-01-07T11:44:48.217176763+00:00 - Successfully got the process details
DEBUG - 2022-01-07T11:44:48.217181203+00:00 - found img_id "quay.io/icdh/segfaulter@sha256:0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
INFO - 2022-01-07T11:44:48.217184162+00:00 - Running crictl ["images", "-o", "json"]
DEBUG - 2022-01-07T11:44:48.237318408+00:00 - Found 30 images
tjungblu commented 2 years ago

running the crictl command directly gives:

sh-4.4# ./crictl images -o json
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
ERRO[0002] connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 
ERRO[0004] connect endpoint 'unix:///run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 
FATA[0006] connect: connect endpoint 'unix:///run/crio/crio.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 
No9 commented 2 years ago

Hmm but it's reporting DEBUG - 2022-01-07T11:44:48.237318408+00:00 - Found 30 images 30 is larger than the error output which would suggest it's running ok https://github.com/IBM/core-dump-handler/blob/main/core-dump-composer/src/main.rs#L458 But the mapping of ids isn't working https://github.com/IBM/core-dump-handler/blob/main/core-dump-composer/src/main.rs#L460 Typical that the debugging doesn't capture the comparison. Let me put a build together with specific logging and we can take a closer look

No9 commented 2 years ago

@tjungblu Building ... https://quay.io/repository/icdh/core-dump-handler/build/e99996b6-6385-4d72-8e7c-2aa180f6c326 I'll run the integration test once the build is complete to confirm everything is in order and drop a note in here. [Edit]As I look at the code and output I think this is going to be around how different k8s flavours populate "imageref"

tjungblu commented 2 years ago

sounds good, let me spin up another cluster for this

tjungblu commented 2 years ago

I took the liberty to run it already, since it's Friday :)

here's the output:

INFO - 2022-01-07T15:08:30.879247169+00:00 - Running crictl ["ps", "-o", "json", "-p", "981abd6f7c6cf3fab61aeba4d1bcc34e90cae5f73885046d7c1c67159ff8dcc6"]
DEBUG - 2022-01-07T15:08:30.903168753+00:00 - ps_output status: exit status: 0
DEBUG - 2022-01-07T15:08:30.903209056+00:00 - ps_output stderr, 
DEBUG - 2022-01-07T15:08:30.903215958+00:00 - ps_output: {
  "containers": [
    {
      "id": "4286a6ab9fd70f1b443905696c74b44586e0ee0729709c51d14b1a236f46c29d",
      "podSandboxId": "981abd6f7c6cf3fab61aeba4d1bcc34e90cae5f73885046d7c1c67159ff8dcc6",
      "metadata": {
        "name": "segfaulter",
        "attempt": 0
      },
      "image": {
        "image": "quay.io/icdh/segfaulter@sha256:0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd",
        "annotations": {
        }
      },
      "imageRef": "quay.io/icdh/segfaulter@sha256:0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd",
      "state": "CONTAINER_RUNNING",
      "createdAt": "1641568110589056944",
      "labels": {
        "io.kubernetes.container.name": "segfaulter",
        "io.kubernetes.pod.name": "segfaulter",
        "io.kubernetes.pod.namespace": "default",
        "io.kubernetes.pod.uid": "12a448e5-9978-415f-b48f-c2b55a997899"
      },
      "annotations": {
        "io.kubernetes.container.hash": "b72928a9",
        "io.kubernetes.container.restartCount": "0",
        "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
        "io.kubernetes.container.terminationMessagePolicy": "File",
        "io.kubernetes.pod.terminationGracePeriod": "30"
      }
    }
  ]
}

DEBUG - 2022-01-07T15:08:30.903536444+00:00 - Successfully got the process details
DEBUG - 2022-01-07T15:08:30.903540804+00:00 - found img_id "quay.io/icdh/segfaulter@sha256:0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
INFO - 2022-01-07T15:08:30.903543625+00:00 - Running crictl ["images", "-o", "json"]
DEBUG - 2022-01-07T15:08:30.926063517+00:00 - Found image list:
 {"images":[{"id":"f37769f487c171c99f84ef0db018ce2055f0ae3a350721ba3e9b98d8c7860563","repoDigests":["quay.io/icdh/core-dump-handler@sha256:958f48e4e18bc822ad9b7feb054d445dae3d92cd2cd84d45ce51f72543fe1c33"],"repoTags":["quay.io/icdh/core-dump-handler:img-logger"],"size":"576651861","spec":null,"uid":null,"username":""},{"id":"d8087c58ebe51554d52054e955680805d86969dc9b6917f5e3fa3ecb81c86e33","repoDigests":["quay.io/icdh/segfaulter@sha256:0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"],"repoTags":["quay.io/icdh/segfaulter:latest"],"size":"10229047","spec":null,"uid":null,"username":""},{"id":"9fe6cec96704ffdf512ad2755c42ddfd36f2ab2aec3a27bae4cce42a8c480e14","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0213887c325d3967d122fe875b45c9259e2b8388db9dc4e0a25c0561414b8737"],"repoTags":[],"size":"400637400","spec":null,"uid":null,"username":""},{"id":"33ef73131becd5dbc3d8f913659a9d82fc6584f22aba85d3840226f891d8a16a","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:03cbd8048e00835d29ced43ddb4548e979e51dda727239cfdf027c9ef47339cf"],"repoTags":[],"size":"416462685","spec":null,"uid":null,"username":""},{"id":"28ea52b98c63aa5dd899d67bf267a3b7dd623f5a694b97a56793bb12597e2de9","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:051c76b923d9826bd577aeb1e176a4096a9af47e9a7c1819158282a5a417170b"],"repoTags":[],"size":"493445698","spec":null,"uid":null,"username":""},{"id":"9efb1f8bb8ab8197515e03b151f90d9828726c9c53564497b25a28bdd7a9753d","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0b8a09ab3c370f7ef89319f3ff66cc346bbfc1cc48b58c2d40ef7d61b33a349c"],"repoTags":[],"size":"293771129","spec":null,"uid":null,"username":""},{"id":"06bcdf9e5bffca01d0395f349a5c6fe8522560425b81adca4f6d54b2e6b8e854","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0fdd27a12ee71d1268ef2e7c4cfe8dbdd3a86e3010f77db3f2e530b928fa2a42"],"repoTags":[],"size":"338690057","spec":null,"uid":null,"username":""},{"id":"5e77e74e95e2dbff030da2f1d1f6d8913893735a609211561fa72896d11d0069","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1267dd9b35b81041888212be03415b2fab37d1ac9e0fb4d9ddcf60c72f7a99ad"],"repoTags":[],"size":"549557329","spec":null,"uid":null,"username":""},{"id":"762b58d25362b9b53b71a0330ebb197d079fd7d5c7556bb20941b96598b7e20e","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:22e31d53a7c6a92176d5a183fd213fff4e2e68c343ccf6cca9c7fc1363e34836"],"repoTags":[],"size":"480226761","spec":null,"uid":null,"username":""},{"id":"1d3b81473a678baf01f66f2d7ad2e31406bfbeb6f2d0c29a2889eb0282290fa5","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2fc2c9d9fae3070c00239ea3d0d1d9fa7477c99c296b17b2fc352794c535912e"],"repoTags":[],"size":"358933144","spec":null,"uid":null,"username":""},{"id":"2cb68ba3a6a2704c8c8b171b643dda06525437744f72cbf9430bb3bb3d06b6cd","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:426f99588fbce292824dba75372675b83daea64a1cf5d321fb5e4182fc43867e"],"repoTags":[],"size":"444563870","spec":null,"uid":null,"username":""},{"id":"f8517523838468766fe503f52b6909274a3e96d9779c1b8a6caf01f56c308dc5","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b14ffcea52f9eb5952546634cb26bfb1d523a4dd81382021c71673fed91efa2"],"repoTags":[],"size":"648096880","spec":null,"uid":{"value":"1001"},"username":""},{"id":"bc8fbc6cfc5c904a48c69b1c8939312ff8edb2c57f3a79dfa08b5b0ee7b2b2c0","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4f4021f6a725ee1bc3c393535742b720ee2fc5ffc978849a2b67fc437debc283"],"repoTags":[],"size":"305719765","spec":null,"uid":null,"username":"nobody"},{"id":"7a846eb1c95bae86701ec53973c5f8e5e51298e14ac19902be92bce44025bc52","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:57d82a9ceb60734194a2edf462de97285e271cf9d63776eca95da92bef96ce11"],"repoTags":[],"size":"450245265","spec":null,"uid":null,"username":""},{"id":"d1a9e73e12ad162d62471317fb715eaa01cad24145a5cf48345ff7e41cb37d4d","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5e33f9d095952866b9743cc8268fb740cce6d93439f00ce333a2de1e5974837e"],"repoTags":[],"size":"365861279","spec":null,"uid":{"value":"65534"},"username":""},{"id":"4e80d22d9377aa6c13076868d997de1dd71dad1117e92169b11961bec39553ee","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7d78a787b5655292d4440942ae018ad5d1c985881c4e9d95d887d4f8450c7899"],"repoTags":[],"size":"337131856","spec":null,"uid":null,"username":""},{"id":"12e74538ccea688b6f2b9bab20d680a6409317e23643a91cf640f168f201614c","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:810097f053d1859d516ab784d975d41ae435ee91f5eaa7c90a02e643620c18fb"],"repoTags":[],"size":"480455294","spec":null,"uid":null,"username":""},{"id":"51f1f8de7be3bdf89050b4e69e8f42876311556ec1bd83857d5609cd40735c60","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a6322f222f98adc1585ff8777c73140a56ac3cdcd8a6949309884b79496bcbb6"],"repoTags":[],"size":"605665179","spec":null,"uid":null,"username":""},{"id":"08bc210159fafe42e9b1bfe3d494f3dd42ba73b03890a050445dc75f28186302","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:aaf0854244f356ccb3f1cd70356362ac5ef7a9f227e08128740306430fd75497"],"repoTags":[],"size":"387003238","spec":null,"uid":null,"username":""},{"id":"55425c0237e89acd2523f9a24f3fe21c9aa7df00ce5f490bc722794b6e2e10ee","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b98f4e019aa3cae1ac333a8241bafdbe52caccfcdcd7f640d1a7410dd33dd788"],"repoTags":[],"size":"457157493","spec":null,"uid":null,"username":""},{"id":"abd5ea3a48e346ec0480185c10c1c747300b38cad4b98e52205324375ff838a1","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c017f8b24e9d9a913456373c69dec63e3315ddae052ac6ad9cee25f856abe502"],"repoTags":[],"size":"393788198","spec":null,"uid":{"value":"1001"},"username":""},{"id":"f74a3835778df0df7489a77b7532f4ebbbd449b9930b0795485d21988de84137","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c56c86d030185bda241514593970e80f75ae75afd9bc6288388944bc2a1dfb1f"],"repoTags":[],"size":"323372661","spec":null,"uid":null,"username":""},{"id":"85eb1eba8745c22b36bd85cf97febb02567f13a5c98e5decc38ed726a6167c87","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:df22255abc474a2b61b323925f76126fa2f8c99affb1c48c2a0eb16c4b4a1056"],"repoTags":[],"size":"398242909","spec":null,"uid":null,"username":""},{"id":"66c3e8e94022ed1a02ec9197196195fdc4272f8e8498947bc3360f5a83a74b4b","repoDigests":["quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e146605c1b75917d26c07268b361134aeda68983b2e2b060c202420b8267aa45"],"repoTags":[],"size":"331811979","spec":null,"uid":null,"username":""},{"id":"2797d420788fb40db6638bec2b5688dab9b0fdc23c211eb073a7cc67eb1b5971","repoDigests":["registry.redhat.io/redhat/certified-operator-index@sha256:bea26c044ebfcbd29fd543c54c1370098462cec13233ad0a5630e0b3a09d8e42","registry.redhat.io/redhat/certified-operator-index@sha256:f7cfa84674fd6b5a9c071ef029dcf1a529fdd35def12884190449ce6048c2f73"],"repoTags":["registry.redhat.io/redhat/certified-operator-index:v4.9"],"size":"708788997","spec":null,"uid":{"value":"1001"},"username":""},{"id":"5852d7cd10d9ac8586c182357ef598bb556e4336e87d51ba04a839d158affd74","repoDigests":["registry.redhat.io/redhat/redhat-marketplace-index@sha256:37b18b852ec1ddc14e211fd801d7d59fac2208ff34231612873b899238577410","registry.redhat.io/redhat/redhat-marketplace-index@sha256:f366e8f7bdd010cf5779659f063b57ff0d478ee12eb1ca5888f19c55a279bd04"],"repoTags":["registry.redhat.io/redhat/redhat-marketplace-index:v4.9"],"size":"697620658","spec":null,"uid":{"value":"1001"},"username":""},{"id":"54488905263c2e726a32a23362addc373eab1582fb708317a339374013a28e0c","repoDigests":["registry.redhat.io/redhat/redhat-operator-index@sha256:caefc33c79258eb4604df24bbd4fc99c0915dad22e354ed2bb1569116bebce88","registry.redhat.io/redhat/redhat-operator-index@sha256:ea5696af4e6ef9827b45a8b89cb88630af4fba363ec18aa7720e1ad1a4fcc9d8"],"repoTags":["registry.redhat.io/redhat/redhat-operator-index:v4.9"],"size":"735415686","spec":null,"uid":{"value":"1001"},"username":""}]}
DEBUG - 2022-01-07T15:08:30.926134795+00:00 - Found 27 images
DEBUG - 2022-01-07T15:08:30.926140967+00:00 - Matching "f37769f487c171c99f84ef0db018ce2055f0ae3a350721ba3e9b98d8c7860563" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926146966+00:00 - Matching "d8087c58ebe51554d52054e955680805d86969dc9b6917f5e3fa3ecb81c86e33" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926150959+00:00 - Matching "9fe6cec96704ffdf512ad2755c42ddfd36f2ab2aec3a27bae4cce42a8c480e14" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926154549+00:00 - Matching "33ef73131becd5dbc3d8f913659a9d82fc6584f22aba85d3840226f891d8a16a" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926157991+00:00 - Matching "28ea52b98c63aa5dd899d67bf267a3b7dd623f5a694b97a56793bb12597e2de9" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926161463+00:00 - Matching "9efb1f8bb8ab8197515e03b151f90d9828726c9c53564497b25a28bdd7a9753d" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926164899+00:00 - Matching "06bcdf9e5bffca01d0395f349a5c6fe8522560425b81adca4f6d54b2e6b8e854" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926168239+00:00 - Matching "5e77e74e95e2dbff030da2f1d1f6d8913893735a609211561fa72896d11d0069" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926171626+00:00 - Matching "762b58d25362b9b53b71a0330ebb197d079fd7d5c7556bb20941b96598b7e20e" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926177011+00:00 - Matching "1d3b81473a678baf01f66f2d7ad2e31406bfbeb6f2d0c29a2889eb0282290fa5" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926180492+00:00 - Matching "2cb68ba3a6a2704c8c8b171b643dda06525437744f72cbf9430bb3bb3d06b6cd" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926184217+00:00 - Matching "f8517523838468766fe503f52b6909274a3e96d9779c1b8a6caf01f56c308dc5" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926187964+00:00 - Matching "bc8fbc6cfc5c904a48c69b1c8939312ff8edb2c57f3a79dfa08b5b0ee7b2b2c0" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926191526+00:00 - Matching "7a846eb1c95bae86701ec53973c5f8e5e51298e14ac19902be92bce44025bc52" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926195229+00:00 - Matching "d1a9e73e12ad162d62471317fb715eaa01cad24145a5cf48345ff7e41cb37d4d" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926198740+00:00 - Matching "4e80d22d9377aa6c13076868d997de1dd71dad1117e92169b11961bec39553ee" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926202137+00:00 - Matching "12e74538ccea688b6f2b9bab20d680a6409317e23643a91cf640f168f201614c" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926205444+00:00 - Matching "51f1f8de7be3bdf89050b4e69e8f42876311556ec1bd83857d5609cd40735c60" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926208798+00:00 - Matching "08bc210159fafe42e9b1bfe3d494f3dd42ba73b03890a050445dc75f28186302" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926212100+00:00 - Matching "55425c0237e89acd2523f9a24f3fe21c9aa7df00ce5f490bc722794b6e2e10ee" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926215511+00:00 - Matching "abd5ea3a48e346ec0480185c10c1c747300b38cad4b98e52205324375ff838a1" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926218960+00:00 - Matching "f74a3835778df0df7489a77b7532f4ebbbd449b9930b0795485d21988de84137" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926222295+00:00 - Matching "85eb1eba8745c22b36bd85cf97febb02567f13a5c98e5decc38ed726a6167c87" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926225642+00:00 - Matching "66c3e8e94022ed1a02ec9197196195fdc4272f8e8498947bc3360f5a83a74b4b" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926229297+00:00 - Matching "2797d420788fb40db6638bec2b5688dab9b0fdc23c211eb073a7cc67eb1b5971" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926233133+00:00 - Matching "5852d7cd10d9ac8586c182357ef598bb556e4336e87d51ba04a839d158affd74" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"
DEBUG - 2022-01-07T15:08:30.926236908+00:00 - Matching "54488905263c2e726a32a23362addc373eab1582fb708317a339374013a28e0c" to 0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"

full file, copied from the pod: composer.log

No9 commented 2 years ago

Love a Friday vibe :) Looks like the issue is imageRef should be matched with the repoDigests array rather than just mapping to id.

I was hoping the id in the image collection was based on the hash in theimageRef but alas not.

As a fix I will also iterate over the repoDigests for a match as that will be the least impacting to the current functionality.

I'll ping when a test image is ready

No9 commented 2 years ago

ok @tjungblu that's building here https://quay.io/repository/icdh/core-dump-handler/build/8942eba5-352d-447f-b19f-ae419caa422e Glad we have got that in as it will be needed for consistent post processing and the client cli

Let me know if you want to add the scc fix to this and I will hold off merging this into main and bundle both aspects as a single release.

tjungblu commented 2 years ago

@No9 awesome. I sadly couldn't figure out which container image tag was built (can't see the build logs), but I took the last tag img-logs and I now have a proper image info in my archive:

-r--r--r--. 1 tjungblu tjungblu 229376 Jan 10 08:54 <uuid>-segfaulter-1-4.core
-r--r--r--. 1 tjungblu tjungblu    293 Jan 10 08:54 <uuid>-segfaulter-1-4-dump-info.json
-r--r--r--. 1 tjungblu tjungblu    288 Jan 10 08:54 <uuid>-segfaulter-1-4-image-info.json
-r--r--r--. 1 tjungblu tjungblu   1184 Jan 10 08:54 <uuid>-segfaulter-1-4-pod-info.json
-r--r--r--. 1 tjungblu tjungblu    996 Jan 10 08:54 <uuid>-segfaulter-1-4-ps-info.json
-r--r--r--. 1 tjungblu tjungblu  27059 Jan 10 08:54 <uuid>-segfaulter-1-4-runtime-info.json
{"id":"d8087c58ebe51554d52054e955680805d86969dc9b6917f5e3fa3ecb81c86e33","repoDigests":["quay.io/icdh/segfaulter@sha256:0630afbcfebb45059794b9a9f160f57f50062d28351c49bb568a3f7e206855bd"],"repoTags":["quay.io/icdh/segfaulter:latest"],"size":"10229047","spec":null,"uid":null,"username":""}

now we're at six files, you have mentioned earlier there should be seven. Anything crucial missing here?

Let me know if you want to add the scc fix to this and I will hold off merging this into main and bundle both aspects as a single release.

I can send you a separate PR towards the end of the week once I got helm properly working. You're fine with adding a post-installation hook job for this?

No9 commented 2 years ago

Hey @tjungblu So it was 7 files including the .zip file. There are 6 files inside the zip so we are in excellent shape (Sorry I mis-read my test) If your happy with a post-hook then I am happy to take it as long as it doesn't interfere with the xKS platforms.

How do you want to deal with the recommendation to use /run/systemd/coredump. I'm happy to capture it as a separate issue and look at it when someone is actually comes looking for it.

tjungblu commented 2 years ago

If your happy with a post-hook then I am happy to take it as long as it doesn't interfere with the xKS platforms.

I think that's the least invasive, I reckon we put it behind an enableOpenShift value flag in helm. I send you a PR later this week :)

How do you want to deal with the recommendation to use /run/systemd/coredump. I'm happy to capture it as a separate issue and look at it when someone is actually comes looking for it.

yeah, I'd suggest we do it that way - it seems to work for now :)

No9 commented 2 years ago

This is actually a little tricky as there is currently a --set daemonset.vendor=rhel7 to support ROKS So for enableOpenShift to work across all providers it would need to do the following:

  1. set the host directories hostDirectory: "/mnt/core-dump-handler" coreDirectory: "/mnt/core-dump-handler/cores"
  2. apply the scc
  3. Determine the host OS to set the VENDOR env_var on the daemonset Step 1 should be OK but will need testing to validate but Step 3 may need to establish the OS of the Node programmatically beforehand as it's used determine which binary to copy to the host so it might not be something that works as part of the post hook.

It might be better to have a flag like --set target=aws-openshift --set target=azure-openshift --set target=ibm-openshift? This would also leave it open to implement the other providers in the compatibility matrix as well as providing an entry point to do other provider specific items down the line. i.e. provision storage. https://github.com/IBM/core-dump-handler/#public-cloud-kubernetes-service-compatibility.

Or you can just add an addSccToUser flag and document the directory options in the compatibility matrix. This is how the likes of sysdig do it today. https://charts.sysdig.com/charts/sysdig/

My preference would be for the --set target oraddSccToUser as the user needs to specify some flags anyway in order to provide the storage configuration. But I'm not against enableOpenShift but it's probably not the easiest to support.

There are likely other options that merge these so feel free to suggest some ideas :)

tjungblu commented 2 years ago

great point, after all your explanation it seems that enableOpenShift is certainly too broad.

It might be better to have a flag like --set target=aws-openshift --set target=azure-openshift --set target=ibm-openshift?

I'm wondering how many platforms we really need here, it seems that ROKS is different because it uses RHEL instead of RHCOS.

I can see where you want to go with the different providers, especially in relation to the "provider-local" storage options. Another solution would be to have different values files for the respective environments you want to support (which is just a textual representation of your --set directives).

Or you can just add an addSccToUser flag and document the directory options in the compatibility matrix.

that certainly sounds like a better and more composable solution, I just have to figure out how to patch the scc from Helm :) If I can't get it to work, I will send you a README update nevertheless.

Looking into the feature, I think we (OpenShift) should build an operator to wrap this to have proper support on OpenShift across all envs - that also solves the issue in (3) as we can easily detect the environment and operating systems.

No9 commented 2 years ago

I'm wondering how many platforms we really need here, it seems that ROKS is different because it uses RHEL instead of RHCOS.

The other xKS services such as GCP/AWS seem to offer "own brand linux" by default and an Ubuntu option for their nodes so I think this will have wider utility

Another solution would be to have different values files for the respective environments you want to support (which is just a textual representation of your --set directives).

I really like the idea of a different values file! Lets go with that along with the addSccToUser flag

Agree with the operator - There is a helm wrapper for this project here https://github.com/IBM/core-dump-operator but it's really just a stub at the moment.

If OpenShift folks are going to pick up an operator it would be great to understand what the plan is so I can either shut that repo down or grant access - whatever makes sense.

tjungblu commented 2 years ago

I couldn't get the job to work that would patch the existing SCC, which makes sense as this would be an easy privilege escalation path. I could make it work by creating a new SCC - so please have a look at #46 :)

I really like the idea of a different values file! Lets go with that along with the addSccToUser flag

awesome! then let's add a couple more, let me know what you think about the naming in the PR - I just bluntly named it openshift again.

Agree with the operator - There is a helm wrapper for this project here https://github.com/IBM/core-dump-operator but it's really just a stub at the moment. If OpenShift folks are going to pick up an operator it would be great to understand what the plan is so I can either shut that repo down or grant access - whatever makes sense.

Nice! The reason I came here is that we have a lighthouse customer that wants this functionality - I'm meeting them on Thursday and we'll decide on the operator aspects based on that. Generally we would work upstream in your operator project if it already exists, so we can discuss that when we get there.

Thanks for your help so far, much appreciated. :rocket:

No9 commented 2 years ago

ok - away from keyboard for the rest of the day but will look at the PR in the morning so you will have an update for Thursday. Delighted this is a customer use case. Sounds like we can work something out on the operator if required.

No9 commented 2 years ago

OK - I've merged the work associated with this issue and we have created separate issues for follow on work so I am closing this. Please track the release project for updates. https://github.com/IBM/core-dump-handler/projects/1