Tool for gathering data from clusters

nirs commented 7 months ago

When starting an env fails in CI environment, we don't have good way to debug the issue. We need to collect data from the failed cluster to allow understanding the failure later. The data can be published for few days as build artifact.

Use cases

CI unattended build

When unattended build fails, we want to delete the environment quickly and use it to run the next job. Without collecting data from the failed system we cannot analyze the failure. We can try to reproduce the issue with a local test environment but this will not help with random errors.

Debugging a system

When debugging a system we can inspect the system manually, but this is very hard and time consuming. It is much easier to gather all the data and use grep on local files.

Creating a snapshot of the system

When debugging an issue, you may want top take a snapshot of the system before an operation, perform the operation, and compare the state of the system to the state before the operation. This can be done manually for few resources using kubectl, but in the time to copy few resources manually, you can gather all resources from the entire cluster.

Getting help from people in different time zone

When you an issue with a system, you can wait few hours for help when someone wakes up on the other side of the planet, or gather everything for the cluster and recreated your environment.

Helping upstream users

On OpenShift users can use oc adm must-gather. It seems that there is no similar tool for upstream Kubernetes.

Data to collect

Resources

It hard to tell which resources are needed to debug an issue, and many resources are well hidden (no way to discover them without knowing about the kind). We will collect all resources from the entire system.

[x] Namespaced resources (e.g. pod)
[ ] Unnamespaced resoruces (e.g. pv)

non-namespaced resources

$ kubectl api-resources --namespaced=false -o name --verbs=list --context dr1
componentstatuses
namespaces
nodes
persistentvolumes
mutatingwebhookconfigurations.admissionregistration.k8s.io
validatingwebhookconfigurations.admissionregistration.k8s.io
customresourcedefinitions.apiextensions.k8s.io
...

Logs

It is hard to tell which logs are needed to debug an issue. We will collect all logs from all pods in the system.

[x] logs from all pods on all clusters

Nodes

[x] rook-ceph logs (#1333, #1346)

Commands

Not sure we can run sos on minikube nodes, and it creates huge reports and very slow, but we can use some commands it uses to collect basic data about a failed system.

General info about the system

Can run via minikube ssh or kubectl debug.

[ ] top -o %CPU from all nodes (see #1282)
[ ] top -o %MEM from all nodes (see #1282)

minikube

[ ] minikube logs - useful when starting a cluster fails

submariner

We can use subctl commands to get info about the health of the cluster.

[ ] subctl show all
[ ] subctl diagnose all

Submariner includes also a gather command subctl gather all, but using it will probably collect the same info we already collect.

kubevirt

Maybe use https://github.com/kubevirt/must-gather?

rook

No gather tool.

Thread in rook slack: https://rook-io.slack.com/archives/C46Q5UC05/p1711399331728259

We can open an issue to add this to rook-ceph plugin.

We can use rbd and ceph commands via rook-ceph-tools pod to get info about the health of the system:

[x] ceph status
[x] ceph osd blocklist ls
[ ] rbd mirror pool status -p replicalpool --verbose

ocm

No gather tool.

Output format

For long term we want to be compatible with oc adm must-gather, so if we create tools for analyzing gathered data, we can use the same tool for upstream and downstream.

We don't know if oc adm must-gather works on upstream (e.g. minikube) and if it is quick enough for development purposes.

It is not clear how oc adm must-gather can be used to collect custom data. It looks like we need to run the tool several times with different images.

We will start with a simple solution and check integration or using oc adm must-gather later.

Testing

I think testing the tool as part of e2e will be the most useful test.

[ ] Gather data from all clusters on every e2e run
[ ] Verify gathered data with the live cluster

nirs commented 7 months ago

Example errors that are impossible to debug without collecting data:

If this is an issue of slow image pull, we can see a Pulled event in kubectl events -A
If this is an image pull error, description of the pod will show the issue
If this is an issue with the operator pod the pod log can explain the issue.
If the cluster was overloaded by some process, top output will show this

With what we have now, we can only blindly increase the timeout which may slow down the retry that can recover from this issue.

drenv.commands.Error: Command failed:
   command: ('addons/rook-operator/start', 'dr1')
   exitcode: 1
   error:
      Traceback (most recent call last):
        File "/home/nsoffer/ramen/test/addons/rook-operator/start", line 55, in <module>
          wait(cluster)
        File "/home/nsoffer/ramen/test/addons/rook-operator/start", line 28, in wait
          kubectl.rollout(
        File "/home/nsoffer/ramen/test/drenv/kubectl.py", line 134, in rollout
          _watch("rollout", *args, context=context, log=log)
        File "/home/nsoffer/ramen/test/drenv/kubectl.py", line 157, in _watch
          for line in commands.watch(*cmd, input=input):
        File "/home/nsoffer/ramen/test/drenv/commands.py", line 155, in watch
          raise Error(args, error, exitcode=p.returncode)
      drenv.commands.Error: Command failed:
         command: ('kubectl', 'rollout', '--context', 'dr1', 'status', 'deploy/rook-ceph-operator', '--namespace=rook-ceph', '--timeout=300s')
         exitcode: 1
         error:
            error: timed out waiting for the condition

nirs commented 6 months ago

Evaluating oc adm must-gather

I played with oc adm must-gather to understand what it can give us with upstream setup.

Testing on regional dr setup with one busybox application, running few hours for reproducing another issue.

Running with the default image did not collect anything, since image was not accessible.

$ time oc adm must-gather --context dr1
[must-gather      ] OUT the server could not find the requested resource (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT 
[must-gather      ] OUT Using must-gather plug-in image: registry.redhat.io/openshift4/ose-must-gather:latest
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
error getting cluster version: the server could not find the requested resource (get clusterversions.config.openshift.io version)
ClusterID: 
ClientVersion: 4.15.0-202403061939.p0.gd6175eb.assembly.stream.el8-d6175eb
ClusterVersion: Installing "" for <unknown>: <unknown>
error getting cluster operators: the server could not find the requested resource (get clusteroperators.config.openshift.io)
ClusterOperators:
    clusteroperators are missing

[must-gather      ] OUT namespace/openshift-must-gather-5tl4p created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-jp25f created
[must-gather      ] OUT pod for plug-in image registry.redhat.io/openshift4/ose-must-gather:latest created
[must-gather-c6vqn] OUT gather did not start: unable to pull image: ImagePullBackOff: Back-off pulling image "registry.redhat.io/openshift4/ose-must-gather:latest"
[must-gather      ] OUT namespace/openshift-must-gather-5tl4p deleted
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-jp25f deleted

Error running must-gather collection:
    gather did not start for pod must-gather-c6vqn: unable to pull image: ImagePullBackOff: Back-off pulling image "registry.redhat.io/openshift4/ose-must-gather:latest"

Falling back to `oc adm inspect clusteroperators.v1.config.openshift.io` to collect basic cluster information.
error running backup collection: the server doesn't have a resource type "clusteroperators"

Reprinting Cluster State:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
error getting cluster version: the server could not find the requested resource (get clusterversions.config.openshift.io version)
ClusterID: 
ClientVersion: 4.15.0-202403061939.p0.gd6175eb.assembly.stream.el8-d6175eb
ClusterVersion: Installing "" for <unknown>: <unknown>
error getting cluster operators: the server could not find the requested resource (get clusteroperators.config.openshift.io)
ClusterOperators:
    clusteroperators are missing

error: gather did not start for pod must-gather-c6vqn: unable to pull image: ImagePullBackOff: Back-off pulling image "registry.redhat.io/openshift4/ose-must-gather:latest"

real    0m10.318s
user    0m0.206s
sys 0m0.070s

Looking in the source I found quay.io/openshift/origin-must-gather which works:

$ time oc adm must-gather --image=quay.io/openshift/origin-must-gather --context dr1
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift/origin-must-gather
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
error getting cluster version: the server could not find the requested resource (get clusterversions.config.openshift.io version)
ClusterID: 
ClientVersion: 4.15.0-202403061939.p0.gd6175eb.assembly.stream.el8-d6175eb
ClusterVersion: Installing "" for <unknown>: <unknown>
error getting cluster operators: the server could not find the requested resource (get clusteroperators.config.openshift.io)
ClusterOperators:
    clusteroperators are missing

[must-gather      ] OUT namespace/openshift-must-gather-4zkbd created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-9r7n9 created
[must-gather      ] OUT pod for plug-in image quay.io/openshift/origin-must-gather created
[must-gather-pcsww] POD 2024-05-02T17:16:16.388647410Z volume percentage checker started.....
[must-gather-pcsww] POD 2024-05-02T17:16:16.424396680Z volume usage percentage 0
[must-gather-pcsww] POD 2024-05-02T17:16:16.621507357Z Error from server (NotFound): namespaces "openshift-cluster-version" not found
[must-gather-pcsww] POD 2024-05-02T17:16:16.621534644Z Error from server (NotFound): namespaces "openshift" not found
[must-gather-pcsww] POD 2024-05-02T17:16:16.621538355Z Error from server (NotFound): namespaces "openshift-etcd" not found
[must-gather-pcsww] POD 2024-05-02T17:16:18.919380951Z Waiting on subprocesses to finish execution.
[must-gather-pcsww] POD 2024-05-02T17:16:18.928886316Z INFO: Gathering HAProxy config files
[must-gather-pcsww] POD 2024-05-02T17:16:18.948180257Z WARNING: Collecting one or more kube-apiserver related logs on ALL masters in your cluster. This could take a large amount of time.
[must-gather-pcsww] POD 2024-05-02T17:16:18.956392041Z INFO: Gathering on-disk MachineConfig from degraded nodes
[must-gather-pcsww] POD 2024-05-02T17:16:18.972163514Z INFO: Collecting host service logs for crio
[must-gather-pcsww] POD 2024-05-02T17:16:18.972705417Z INFO: Collecting host service logs for kubelet
[must-gather-pcsww] POD 2024-05-02T17:16:18.973526928Z INFO: Collecting host service logs for rpm-ostreed
[must-gather-pcsww] POD 2024-05-02T17:16:18.973913424Z INFO: Collecting host service logs for ostree-finalize-staged
[must-gather-pcsww] POD 2024-05-02T17:16:18.974570133Z INFO: Collecting host service logs for machine-config-daemon-firstboot
[must-gather-pcsww] POD 2024-05-02T17:16:18.974976920Z INFO: Collecting host service logs for machine-config-daemon-host
[must-gather-pcsww] POD 2024-05-02T17:16:18.975389209Z INFO: Collecting host service logs for NetworkManager
[must-gather-pcsww] POD 2024-05-02T17:16:18.975758829Z INFO: Collecting host service logs for openvswitch
[must-gather-pcsww] POD 2024-05-02T17:16:18.976128475Z INFO: Collecting host service logs for ovs-configuration
[must-gather-pcsww] POD 2024-05-02T17:16:18.976518418Z INFO: Collecting host service logs for ovsdb-server
[must-gather-pcsww] POD 2024-05-02T17:16:18.976887768Z INFO: Collecting host service logs for ovs-vswitchd
[must-gather-pcsww] POD 2024-05-02T17:16:18.977333446Z INFO: Waiting for worker host service log collection to complete ...
[must-gather-pcsww] POD 2024-05-02T17:16:19.323209555Z INFO: Waiting for node performance related collection to complete ...
[must-gather-pcsww] POD 2024-05-02T17:16:19.747628728Z error: the server doesn't have a resource type "clustercsidriver"
[must-gather-pcsww] POD 2024-05-02T17:16:19.806296372Z error: the server doesn't have a resource type "clusterversion"
[must-gather-pcsww] POD 2024-05-02T17:16:20.067845016Z error: the server doesn't have a resource type "podnetworkconnectivitychecks"
[must-gather-pcsww] POD 2024-05-02T17:16:20.174215337Z error: a resource cannot be retrieved by name across all namespaces
[must-gather-pcsww] POD 2024-05-02T17:16:20.364051213Z error: the server doesn't have a resource type "routes"
[must-gather-pcsww] POD 2024-05-02T17:16:20.863092471Z error: the server doesn't have a resource type "performanceprofile"
[must-gather-pcsww] POD 2024-05-02T17:16:20.932718419Z INFO: "metallb-operator" not detected. Skipping.
[must-gather-pcsww] POD 2024-05-02T17:16:20.954723928Z INFO: Collecting Insights Archives from 
[must-gather-pcsww] POD 2024-05-02T17:16:21.081034002Z error: the server doesn't have a resource type "ingresscontroller"
[must-gather-pcsww] POD 2024-05-02T17:16:21.084002336Z No resources found
[must-gather-pcsww] POD 2024-05-02T17:16:21.104097761Z No resources found in openshift-etcd namespace.
[must-gather-pcsww] POD 2024-05-02T17:16:21.114010831Z INFO: "sriov-network-operator" not detected. Skipping.
[must-gather-pcsww] POD 2024-05-02T17:16:21.119205041Z INFO: "kubernetes-nmstate-operator" not detected. Skipping.
[must-gather-pcsww] POD 2024-05-02T17:16:21.125265772Z INFO: Worker host service log collection to complete.
[must-gather-pcsww] POD 2024-05-02T17:16:21.126094344Z INFO: Waiting for HAProxy config collection to complete ...
[must-gather-pcsww] POD 2024-05-02T17:16:21.126117096Z INFO: HAProxy config collection complete.
[must-gather-pcsww] POD 2024-05-02T17:16:21.177920500Z INFO: Waiting for on-disk MachineConfig collection to complete ...
[must-gather-pcsww] POD 2024-05-02T17:16:21.177942283Z INFO: on-disk MachineConfig config collection complete.
[must-gather-pcsww] POD 2024-05-02T17:16:21.214639597Z Wrote inspect data to must-gather.
[must-gather-pcsww] POD 2024-05-02T17:16:21.272760034Z Wrote inspect data to must-gather.
[must-gather-pcsww] POD 2024-05-02T17:16:21.358301106Z error: resource name may not be empty
[must-gather-pcsww] POD 2024-05-02T17:16:21.365273232Z Wrote inspect data to must-gather.
[must-gather-pcsww] POD 2024-05-02T17:16:21.389881470Z error: the server doesn't have a resource type "network"
[must-gather-pcsww] POD 2024-05-02T17:16:21.649852626Z volume usage percentage 0
[must-gather-pcsww] POD 2024-05-02T17:16:21.706715719Z error: the server doesn't have a resource type "machineconfigs"
[must-gather-pcsww] POD 2024-05-02T17:16:21.869744850Z error: the server doesn't have a resource type "multi-networkpolicy"
[must-gather-pcsww] POD 2024-05-02T17:16:22.015923944Z error: the server doesn't have a resource type "machineconfigpools"
[must-gather-pcsww] POD 2024-05-02T17:16:22.078964549Z error: the server doesn't have a resource type "net-attach-def"
[must-gather-pcsww] POD 2024-05-02T17:16:22.226188970Z error: the server doesn't have a resource type "overlappingrangeipreservations"
[must-gather-pcsww] POD 2024-05-02T17:16:22.277823579Z error: the server doesn't have a resource type "ippools"
[must-gather-pcsww] POD 2024-05-02T17:16:22.346043447Z No resources found
[must-gather-pcsww] POD 2024-05-02T17:16:22.479528067Z INFO: Waiting for network log collection to complete ...
[must-gather-pcsww] POD 2024-05-02T17:16:22.506073546Z INFO: Network log collection complete.
[must-gather-pcsww] POD 2024-05-02T17:16:22.580723347Z error: the server doesn't have a resource type "featuregates"
[must-gather-pcsww] POD 2024-05-02T17:16:22.751836119Z error: the server doesn't have a resource type "kubeletconfigs"
[must-gather-pcsww] POD 2024-05-02T17:16:22.919859455Z error: the server doesn't have a resource type "tuneds"
[must-gather-pcsww] POD 2024-05-02T17:16:23.090284347Z Wrote inspect data to must-gather.
[must-gather-pcsww] POD 2024-05-02T17:16:23.425026194Z Error from server (NotFound): namespaces "openshift-cluster-node-tuning-operator" not found
[must-gather-pcsww] POD 2024-05-02T17:16:23.429920128Z ERROR: Failed to identify the container image with node tools.
[must-gather-pcsww] POD 2024-05-02T17:16:23.429940952Z INFO: Node performance data collection will not contain node level data.
[must-gather-pcsww] POD 2024-05-02T17:16:23.430826233Z INFO: Node performance data collection complete.
[must-gather-pcsww] OUT waiting for gather to complete
[must-gather-pcsww] OUT downloading gather output
WARNING: rsync command not found in path. Please use your package manager to install it.
[must-gather-pcsww] OUT ./timestamp
[must-gather-pcsww] OUT ./host_service_logs/masters/rpm-ostreed_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/machine-config-daemon-firstboot_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/ostree-finalize-staged_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/crio_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/NetworkManager_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/ovs-configuration_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/machine-config-daemon-host_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/ovs-vswitchd_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/openvswitch_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/ovsdb-server_service.log
[must-gather-pcsww] OUT ./host_service_logs/masters/kubelet_service.log
[must-gather-pcsww] OUT ./nodes/debug
[must-gather-pcsww] OUT ./event-filter.html
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.ramendr.openshift.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.submariner.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v2.operators.coreos.com.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1beta1.snapshot.storage.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.velero.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha2.operators.coreos.com.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.events.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.scheduling.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1..yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.operators.coreos.com.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.policy.open-cluster-management.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v2.autoscaling.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.work.open-cluster-management.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.node.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.multicluster.x-k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.replication.storage.openshift.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.ceph.rook.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1beta1.policy.open-cluster-management.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.autoscaling.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.csiaddons.openshift.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.discovery.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.cluster.open-cluster-management.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.apps.open-cluster-management.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.batch.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1beta2.flowcontrol.apiserver.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1beta3.flowcontrol.apiserver.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.apps.open-cluster-management.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.admissionregistration.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.submariner.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.packages.operators.coreos.com.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.operator.open-cluster-management.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.operators.coreos.com.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.volsync.backube.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.apiextensions.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1alpha1.objectbucket.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.authentication.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.policy.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v2alpha1.velero.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.storage.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.authorization.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.snapshot.storage.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.apps.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.rbac.authorization.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.certificates.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.networking.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/apiregistration.k8s.io/apiservices/v1.coordination.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/operators.coreos.com/olmconfigs/cluster.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/operators.coreos.com/operators/ramen-dr-cluster-operator.ramen-system.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/core/nodes/dr1.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/core/persistentvolumes/pvc-23e31759-6ffe-4977-b70a-e3b2a5c103e0.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/core/persistentvolumes/pvc-02987ebf-1b0c-42ce-b3a2-7b5efe1de612.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/snapshot.storage.k8s.io/volumesnapshotclasses/csi-hostpath-snapclass.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/storage.k8s.io/volumeattachments/csi-b42acd6adc1cc5095f2c2e3b4e93fd531aa635c772cec28ed8165eb31e23cdd9.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/storage.k8s.io/csinodes/dr1.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/storage.k8s.io/csidrivers/rook-ceph.rbd.csi.ceph.com.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/storage.k8s.io/csidrivers/rook-ceph.cephfs.csi.ceph.com.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/storage.k8s.io/csidrivers/hostpath.csi.k8s.io.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/storage.k8s.io/storageclasses/standard.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/storage.k8s.io/storageclasses/rook-ceph-block.yaml
[must-gather-pcsww] OUT ./cluster-scoped-resources/storage.k8s.io/storageclasses/csi-hostpath-sc.yaml
[must-gather-pcsww] OUT ./version
[must-gather-pcsww] OUT ./namespaces/ramen-system/operators.coreos.com/subscriptions/ramen-dr-cluster-subscription.yaml
[must-gather-pcsww] OUT ./namespaces/ramen-system/operators.coreos.com/operatorgroups/ramen-operator-group.yaml
[must-gather-pcsww] OUT ./namespaces/operators/operators.coreos.com/operatorgroups/global-operators.yaml
[must-gather-pcsww] OUT ./namespaces/olm/operators.coreos.com/clusterserviceversions/packageserver.yaml
[must-gather-pcsww] OUT ./namespaces/olm/operators.coreos.com/operatorconditions/packageserver.yaml
[must-gather-pcsww] OUT ./namespaces/olm/operators.coreos.com/catalogsources/operatorhubio-catalog.yaml
[must-gather-pcsww] OUT ./namespaces/olm/operators.coreos.com/operatorgroups/olm-operators.yaml
[must-gather-pcsww] OUT ./pod_network_connectivity_check/podnetworkconnectivitychecks.yaml
[must-gather-pcsww] OUT ./network_logs/net-attach-def
[must-gather-pcsww] OUT ./network_logs/overlappingrangeipreservations.whereabouts.cni.cncf.io
[must-gather-pcsww] OUT ./network_logs/multi-networkpolicy
[must-gather-pcsww] OUT ./network_logs/cluster_scale
[must-gather-pcsww] OUT ./network_logs/ippools.whereabouts.cni.cncf.io
Ignoring the following flags because they only apply to rsync: -z
[must-gather      ] OUT namespace/openshift-must-gather-4zkbd deleted
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-9r7n9 deleted

Reprinting Cluster State:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
error getting cluster version: the server could not find the requested resource (get clusterversions.config.openshift.io version)
ClusterID: 
ClientVersion: 4.15.0-202403061939.p0.gd6175eb.assembly.stream.el8-d6175eb
ClusterVersion: Installing "" for <unknown>: <unknown>
error getting cluster operators: the server could not find the requested resource (get clusteroperators.config.openshift.io)
ClusterOperators:
    clusteroperators are missing

real    0m40.417s
user    0m0.174s
sys 0m0.075s

But it did not collect much data:

$ du -sh must-gather.local.890950841412942527
356K    must-gather.local.890950841412942527

$ tree must-gather.local.890950841412942527 | tail -1
44 directories, 90 files

Gathered data: must-gather.local.890950841412942527.tar.gz

Gathered data structure:

├── quay-io-openshift-origin-must-gather-sha256-a9f3d2f463ef11da0debde26ef99766c391ba97dee4094405b75abc3a548c749
│   ├── cluster-scoped-resources
│   │   ├── apiregistration.k8s.io
│   │   │   └── apiservices
│   │   │       ├── v1.admissionregistration.k8s.io.yaml
...
│   │   ├── core
│   │   │   ├── nodes
│   │   │   │   └── dr1.yaml
│   │   │   └── persistentvolumes
│   │   │       ├── pvc-02987ebf-1b0c-42ce-b3a2-7b5efe1de612.yaml
│   │   │       └── pvc-23e31759-6ffe-4977-b70a-e3b2a5c103e0.yaml
│   │   ├── operators.coreos.com
│   │   │   ├── olmconfigs
│   │   │   │   └── cluster.yaml
│   │   │   └── operators
│   │   │       └── ramen-dr-cluster-operator.ramen-system.yaml
│   │   ├── snapshot.storage.k8s.io
│   │   │   └── volumesnapshotclasses
│   │   │       └── csi-hostpath-snapclass.yaml
│   │   └── storage.k8s.io
│   │       ├── csidrivers
│   │       │   ├── hostpath.csi.k8s.io.yaml
│   │       │   ├── rook-ceph.cephfs.csi.ceph.com.yaml
│   │       │   └── rook-ceph.rbd.csi.ceph.com.yaml
│   │       ├── csinodes
│   │       │   └── dr1.yaml
│   │       ├── storageclasses
│   │       │   ├── csi-hostpath-sc.yaml
│   │       │   ├── rook-ceph-block.yaml
│   │       │   └── standard.yaml
│   │       └── volumeattachments
│   │           └── csi-b42acd6adc1cc5095f2c2e3b4e93fd531aa635c772cec28ed8165eb31e23cdd9.yaml
...
│   ├── namespaces
│   │   ├── olm
│   │   │   └── operators.coreos.com
│   │   │       ├── catalogsources
│   │   │       │   └── operatorhubio-catalog.yaml
│   │   │       ├── clusterserviceversions
│   │   │       │   └── packageserver.yaml
│   │   │       ├── operatorconditions
│   │   │       │   └── packageserver.yaml
│   │   │       └── operatorgroups
│   │   │           └── olm-operators.yaml
│   │   ├── operators
│   │   │   └── operators.coreos.com
│   │   │       └── operatorgroups
│   │   │           └── global-operators.yaml
│   │   └── ramen-system
│   │       └── operators.coreos.com
│   │           ├── operatorgroups
│   │           │   └── ramen-operator-group.yaml
│   │           └── subscriptions
│   │               └── ramen-dr-cluster-subscription.yaml

Summary:

hard to use (need to find multiple images that collect what you need, run with all images to collect the data)
almost no data collected (nothing related to anything we care about, no logs)
slow (40 seconds for 356k, 0.008 MiB/s)
structure is little bit too deep, but it should be easy to create compatible structure
lot of data from the node (network logs, host services logs)
need to run on ocp to see if we get more data and learn about pods/containers/logs directory layout.

We need to try odf must-gather image, hopefully it is public.

RamenDR / ramen